June 16, 2006
Unified Theory of Learning Systems; or why markets, neural nets, evolution, and Page Rank are all the same thing
I was recently thinking about the commonalities between all systems that can learn. Neural networks learn by updating the strength of the connections between neurons. Groups of people and societies in general learn by leveraging the power of markets. Species learn through natural selection and evolution. The Page Rank algorithm that Google uses to analyze the links between all pages on the web learns by updating the weight of each page based on the pages that link to it. What do all these systems have in common? I will try to answer that question in this post.
I first saw the connection between neural networks and markets when I was thinking about prediction markets. In a neural network, every time a connection is used to produce a "positive" result the strength of the connection is increased, and every time it is used to produce a "negative" result the strength of the connection is decreased. Positive and negative depend on the purpose of the neural network. If the neural network is the brain of a mouse, finding cheese would be considered a positive result but getting an electric shock would be a negative result. If the neural network is supposed to identify a handwritten number 6, then correct identification would be considered positive, and incorrect identification would be a negative. In any case, the positive signal could be considered a payment to the neural system, and the updating of the weights is how the neural net distributes the payment among those connections that were responsible for the results. Those connections that are right, i.e. that were activated when the payoff was positive, get paid more by having their strength increased. In contrast those connections that were wrong lose part of their fortune by having their strength diminished. We can start seeing an analogy to a market here, a connection is a bet in a particular direction, and those connections that get it right have more to bet in the next round, but those connections who get it wrong have less to bet.
Now let's look at a market. When an individual buys a security in a market he is betting that the security is going to go up in price. If he is right he makes money and the purchasing power of his account increases, if he is wrong, he loses money and his purchasing power decreases. Next time he makes a decision to buy or sell, the amount that he can buy or sell is going to be limited to how much he has in his account. After several rounds those who usually guess right will have pretty big accounts, and those who usually guess wrong will have very small accounts. So their guesses will be weighted differently, and the price resulting from aggregating those guesses will become more accurate. Once the market reaches a steady state the purchasing power of individuals will be proportional to how good they are at guessing the right price, and the aggregate price will be the most accurate. So the purchasing power of each individual is the weight of their vote, the strength of their connection to the final aggregated price.
From the previous paragraphs we can see how neural nets and markets are alike. We can think of markets as weighted voting systems, where the size of the account of each individual is their weight, and the system updates the weights of individuals according to the accuracy of their votes in predicting an outcome. We can think of neural networks as markets where connections bid for a particular correlation between the activity of two neurons, and those who bid correctly get rich, but those who bid incorrectly lose their shirts.
We can now start to see some of the commonalities:
1) Both of these learning systems have a predicted signal and an actual signal, their objective is to make the predicted sinal equal to the actual signal. In the case of a market the predicted signal is the market price of a security and the actual signal is the true price of the security, the price that makes supply equal demand and that makes markets clear. In the case of a neural net the predicted and actual signals are more obvious since neural nets are designed for pattern recognition where the pattern is the actual signal, and the predicted signal is what the neural net guesses when it is learning.
2) Both of these systems are composed of several distinct units that vote/bet for an outcome, or for a part of the process in an outcome. In the neural net each connection between two neurons is a vote for the state of one neuron being relevant to the state of the other neuron in predicting an outcome. In a market each individual buying or selling a security is voting on whether that security is overpriced or underpriced.
3) Both of these systems give different weights to different voters. Connections are weighted in a neural net, different market participants have different levels of wealth, where the wealth represents buying power, and since each transaction is a vote, the vote is weighted by the wealth of the person doing the transaction.
4) Both of these systems learn by updating the weights of the voters. In a market those who vote wrong lose some of their wealth, in a neural net those connections who contribute to wrong results lose part of their strenght.
5) Once these systems learn, their weights stop being updated. This is achieved by a dynamic equilibrium where the sum total of positive updates is equal and opposite to the sum total of negative updates. Once a neural net learns what it needs to learn, the weights remain stable, they lose as much as they gain in every training round. Once the market participants have attained a level of wealth that represents all their knowledge on a particular security, they are as likely to guess right as they are to guess wrong, so their gain as frequently as they lose and their wealth remains static.
Now that we have these basis, it is easy to see how they extend to other learning systems. Consider evolution, each gene is a vote for a particular physical characteristic for an organism, and on how likely is the organism that possesses that characteristic to survive in its environment. Those genes that vote correctly get reproduced more often than those who vote incorrectly, and their percentages in the population increase. This means that they have a higher weight on the percentage of the next generation. The wealth/weight of each gene is represented in the percent probability that they have of being passed to the next generation in contrast to competing genes. The fitness in genetic algorithm parlance. Once the optimal phenotype is reached the percentages reach a steady state, where half of the organisms are helped by the characteristic, and the other half are hindered. This shows that evolution is very similar to markets and neural nets, since it conforms to the same five broadly outlined principles above.
What about the Page Rank algorithm? This algorithm is applied to a directed graph to find out which nodes are the most popular. At first glance you would want to count how many incoming links each node has to measure its popularity. That is a good first approximation, but not all links are equal, the links of more popular nodes are worth more than the links of unpopular nodes. So you start by counting the links and assigning a popularity to each node, then you count the links again but weight the links according to the popularity of the incoming node. This updates the popularity of each node, so you do this again, until you reach a steady state and the popularity measure of the nodes doesn't change. Again, we can see that each node is voting by linking to other nodes, and its vote is weighted by its popularity. Its popularity is updated according to how many popular nodes point to it. The algorithm of updating can be thought of as a flux of popularity through the system, from the least popular to the most popular. Each node in between has some popularity coming in and some going out. The steady state is reached when the popularity coming in equals the popularity going out. Although the training signal here is less obvious than in other learning systems, it is still something separate from the predicted signal, it is the actual popularity of each node according to the graph topology. As we can see Page Rank also shares the same five basic characteristics of learning systems that markets, neural nets, and evolving systems all have.
I think we can apply the same sort of analysis to anything that learns. That is a conjecture that I am not going to prove because it would require me to either enumerate all learning systems, or prove that a system that doesn't share those characteristics cannot learn. That proof might be doable, but that would be too much work for a Friday night post. My purpose is only to advance the hypothesis that all learning systems are instances of the same algorithm and to provide some examples as evidence to that conjecture.
To the science and engineering inclined, this can serve as motivation to specify the learning algorithm exactly by observing many instances of systems that learn. To the mathematically inclined, this could serve as motivation to find a rigorous argument of why this is true. To the scientifically inclined with a skeptic bent, this can serve as inspiration to try to disprove it probably by finding an example of a learning system that doesn't fit the model. As for me, this was just a way to record an insight I found interesting.
As I finish writing this I realized that the expectations maximization algorithm, and the fuzzy clustering algorithm also fit this description, but I leave these as an exercise to the reader because I am tired of typing, and I want this post to be short enough for people to actually read.