Sensory input is crucial to human existence, providing us with the information we need to understand and navigate our world. Our reliance on our senses is reflected by the structure of our society; different colors (sight) tell us whether it’s safe to cross a street, high-pitched ambulance sounds (hearing) warn us to get out of the way. While we all have slight variations in how our senses function - some, for example, need glasses to see - it is clear that these inputs are crucial to our being. But can the limits of our senses prevent our minds from attaining greater knowledge?

We can only see light in the visible spectrum and hear a narrow band of pitches. Since sensory input is imperative to our understanding of the world, this poses the question as to whether the limitation of our senses restricts us from having a fuller understanding of the universe. By using machine learning techniques to model how input leads to understanding, I will demonstrate that not only does this limit exist, but that it poses a threat to human progress.

To begin, we break understanding into four components: input, processing, action, and learning. Input has to do with information gathering, which is done via our senses. Processing transforms this information into a model we use to take action on our surroundings. Finally, learning happens after the action performed is related to new information.

Without a way to process information, we wouldn’t be able to make sense of our environment, therefore restricting our navigation to random instinct, as seen in the example below:

This blue vessel has no senses yet. It is strictly mechanical and moves around the box by instinct as opposed to reason, which requires both input and processing. It is the precursor to an agent, which is a term used in economics, science, and technology to refer to an autonomous entity capable of making decisions. Neural networks are a popular tool to construct digital agents: information is fed into the network which outputs a decision. Think of a neural network as the agent’s brain which will do the processing. There is one other piece that needs to be connected: the inputs to the neural network. Below is a new agent with a narrow field of view (FOV) backed by a neural network:

The arc extruding from the agent shows what it can sense. The vision works much like our own, covering a direction and a distance. What the agent can see, however, means nothing if it does not have a way to understand the input. Setting goals is one way to learn from processed information. In the next examples, the agent will have the goal of collecting green dots and avoiding red ones. When you accomplish a goal, you get a reward. Instead of directly telling the agent what to do, let’s define reward as a mathematical function:

reward = # green dots collected - # red dots collected

The agent will work towards this goal by learning what combination of inputs and actions lead to higher reward. Reinforcement learning is a technique used to teach agents how to act in an environment. Agents learn how to achieve a maximum reward through a process of trial and error. This is similar to the game of Hot or Cold, where the goal is to find a hidden object. As you move around the hidden object, the hider tells you hot or cold depending on the action you take. Move towards the object, and you get a warmer reward. Move away from it, and you get a colder punishment. The input is the hider’s statement which you process, the action is the direction you move, and the change in temperature reward helps you learn.

Like a newborn, a freshly generated agent is dropped into a world filled with input. It has a goal to collect green dots and avoid red ones, but it has no idea how to correlate input and actions to rewards. The first part of an agent’s life is spent learning, figuring out how certain combinations of inputs and actions can result in higher rewards. After a while, our agent begins to perform better.

Notice how the trained agent moves with intention; it attempts to avoid red dots and gather green ones. It occasionally bumps into red dots that are just outside its field of vision. It fails to collect many green dots that are beyond its vision. Because the agent’s field of vision is crucial in its ability to avoid red dots and collect green ones, we expect an agent with a larger FOV to be better at achieving this goal. Let’s see how one such agent performs:

This agent is better at avoiding red dots and does a better job at collecting green dots because of the extra sensing capabilities. To compare the two agents, we can simplify the performance to the number of green dots collected minus the number of red dots collected. When the agent is first generated, we expect the total value to fluctuate around 0 as it randomly collides with both colors. As the agent learns, the rate at which it collects red dots should decrease and the rate of green dots should increase. Consider this period studying before an exam. You expect that the longer you study (process information) the better you do on the exam. Just as our performance is based on knowledge acquired prior to exam time, an agent’s performance depends on what it learned during the training period.

As expected, the value stays around 0 early on while the agent familiarizes itself with its environment. After some time has passed, the value begins to increase, indicating that the agent is learning. At test time, the performance maintains a steady linear rate with the slope defined by the end of the learning stage. The value for the agent with a larger FOV quickly exceeds that of the agent with a small FOV. This is because the agent with more information constructed a better representation of its environment, which allowed it to simulate outcomes from different actions.

Like the agents above, humans are born capable of perceiving a certain range; the distance we can see, the frequencies we can hear, these all degrade with age. Conversely, as time progresses, machine sensors are becoming much better. Human brains may still be far ahead of neural networks in terms of learning capabilities, but our senses already pale in comparison to the inputs available to machines. An adjustable camera lens, for example, can see a wider field than the human eye and zoom in further than we can see. Because our understanding of the world is bound by the nature of our perception, it is conceivable that thinking machines - with their superior senses - will have a better understanding than us. Put another way, human intelligence may be stuck at a local maximum, but that does not prevent agents from searching for a global maximum far from our own.

Intelligence is not easily defined, and it is far-fetched to associate performance on a trivial task with intellect. Regardless of the definition of intelligence, it certainly has to do with learning. The agent with superior sensing capabilities was able to learn how to maneuver within it’s environment better than the agent with less sensing capabilities. Whether it is driving a car or understanding each other, we need to come to terms with the fact that there are domains where machines will learn to do things much better than humans are capable of.

Going back to the original vessel with no senses, we can look at its performance on the item collecting task the same way we compared the agents above. Although it cannot perceive the environment, the exploration that occurs during learning causes fluctuations in its performance. It appears that learning is the heartbeat of the mind. When learning stops, existence flatlines.