“It is possible to invent a single machine which can be used to compute any computable sequence,” – Alan Turing
Alan Turing conceived the idea of the Universal Machine in 1936 when he was 24 years old. His Wikipedia page introduces him as an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. The foundation he laid then inspired a small group of highly motivated and brilliant individuals led by John Von Neumann. With only five kilobytes of memory available, they developed unmatched meteorological predictions and tackled problems from the evolution of viruses to the evolution of stars.
Alan Turing invented the Universal Machine, a theoretical construct in 1936, and von Neumann’s project was its realization. The stored-program computer, conceived by Alan Turing and delivered by John von Neumann, dissolved the distinction between numbers that mean something and numbers that do something. It revolutionized the world. Today, digital computing is deeply embedded in our daily lives. Its transformative power, driven by remarkable technological advancement and commercial success, is beyond question.
Deep learning is the new engine powering the revolution today. Deep learning is a branch of machine learning that draws on mathematics, computer science, and neuroscience. The deep network learns from data as babies learn from the world around them, beginning with fresh eyes and gradually acquiring the skills to navigate unfamiliar environments. Deep learning traces its origins back to the birth of artificial intelligence in the 1950s when two competing visions of how to create an AI were in play. Logic and computer programs ruled AI for decades while learning directly from data took much longer to develop.
Logic, when computers were still stunted, and storage was expensive by today’s standards (remember 5 kb!), was a great way to solve problems in the twentieth century. Skilled programmers wrote a different program for each problem, and as you get bigger problems, your programs get bigger as well. The abundance of big data and computing power has radically changed this equation. Using learning algorithms to solve problems is faster, more accurate, and more efficient nowadays. It is much less labor-intensive to solve many difficult problems using the same learning algorithm than writing different programs for each one. Data-centric approaches typical of deep learning have even prompted the development of tailored platforms that are not Von Neumann-based.
Despite this, deep learning is not magic and is just another way to construct computer programs. Deep learning offers hope after logic-based aspirations failed to bring intelligence. However, it is not a panacea, at least not yet. Even though deep learning has enjoyed considerable success in non-critical applications, its inherent statistical nature makes it unsuitable for applications requiring close to 100% reliability. Moreover, as I will soon explain, these applications are more important than they appear.
In a very vague sense, two branches of mathematics are pertinent to building intelligent systems: dynamics and statistics. A delineation, subject to interpretation, can be made using the following rationale. As data has regularities and patterns, an intelligent system should analyze them statistically. On the other side, there are regularities in the motion of things in the world, so the intelligent system should model those dynamics as well. Explaining the different modes of thinking is essential: statistics uses many samples to look for a pattern, given that we know nothing else about the system (often making assumptions about how things are distributed). Dynamics attempts to find the equations of motion of a system with few samples.
There is a heavy bias toward statistics in the current approach to machine learning. Despite incorporating some priors into the models (inductive bias), the general approach involves throwing more data and computation at a system and expecting miracles rather than building a system that can make intelligent inferences based on dynamics. Amazingly, this approach has brought about some miracles! However, for me, it is not easy to say if these miracles occurred because of this approach or despite it.
There is no doubt in my mind that a large part of these miracles can be attributed to the abundance of data and computation. One thing that should not be overlooked is that advances in software engineering have made it possible to train huge deep-learning models with less than 20 lines of code. Scripting kids can now access general-purpose GPU computing, previously an esoteric topic accessible only to experienced programmers, thanks to the brilliant minds that have developed high-level abstractions. While Moore’s Law, by its strictest definition of doubling chip densities every two years, might not be the case anymore. Decentralized computing and pushing computation to the edge have rendered this irrelevant.
In addition, I believe that the current “statistical” approach to AI might still hold some untapped potential. This potential is untapped even in the biggest labs and companies since AI models are primarily engineered in an ad-hoc manner. This is a natural consequence of a lack of in-depth studies of the discipline. A new engineering field should emerge to address the challenges of utilizing AI to enhance human life. This new engineering discipline should aim to corral the power of a few key ideas to which the preceding century gave substance, such as information, algorithm, data, uncertainty, computing, inference, and optimization. Other more established engineering fields, such as civil engineering, have also emerged as a result of taking a similar path. Using physics concepts, civil engineers first devised design codes and then used those codes to contribute to the building of the current civilization. The current approach to AI lacks such design codes. Such codes for AI systems are far from being available. Many more brilliant minds are needed to close the gap. A good place to start here might be studying the applications of representation theory to machine learning.
There is also another line of reasoning that makes me reluctant to give credit for these miracles to the current approach. Not so long ago, it was said that computer vision could not compete with the abilities of a one-year-old. Around 2012, something changed the field of machine vision. It was the introduction of deep Convolutional Neural Networks. CNN soon became the cornerstone of deep learning for a while. The mere introduction of translational invariance via CNN has proven revolutionary; why not investigate further in that direction?
Even though data and computation are abundant, they are still not free. Furthermore, there is a limit to how much advantage an intelligent system can gain from more computation and data. Gains from more data diminish as we become more concerned with less routine tasks. This is where the sanity ends, and current models fail. Tail events, while rare, still matter tremendously. Humans have evolved to overestimate the chances of tail events for good reasons. My first year of grad school was a tail event. Tail events have proven to be the bottlenecks with irreversible consequences. Current approaches to machine learning are fundamentally statistical and, as a consequence, “short-tailed.”
It is my understanding that autonomy is best solved by models that encompass the full physical dynamics responsible for generating the complexity we encounter. Ultimately, this understanding of the world should become “non-statistical.” Using statistics to build a model of dynamics might be the real solution.