2023-05-31

Embracing the Unknown: Lessons from Chaos Theory for Data Scientists

Insights for understanding the limits of predictive models

Hennie de Harder

Data Scientist

Sometimes during a data science project, you discover that it’s really hard to improve your metric. You try many things: complex models, adding more data, hyperparameter tuning, feature engineering, feature selection, everything. It just doesn’t get better. You can’t even improve the baseline, which was a simple moving average. What is happening? In such cases, maybe you should stop trying because something else is going on.

Butterflies. Image created with Midjourney by the author.

In this post, I want to share why it is not always possible to get good predictions. In specific projects, you might be dealing with chaos. Not in the normal sense of the word (complete chaos or randomness), but with scientific chaos. A chaotic system is really hard or impossible to predict, especially in the long term.

In the following three paragraphs, you’ll get a good understanding of chaos and what it can mean for a data science project.

1. The Existence of Chaos

Before scientific chaos was discovered, Newton had a dream. He thought it was possible to understand all of nature through principles of physics that could be expressed mathematically. Newton did a great job in understanding the solar system this way.

The story of chaos starts in the twentieth century. Edward Lorenz, a meteorologist (aka the father of chaos theory), studied the behavior of weather patterns using computer simulation. He found that if he started the simulation with slightly different initial conditions, the outcome of the simulation could be vastly different from the original result.

These findings challenged the belief at the time that weather patterns were predictable and could be accurately forecast using mathematical models. This work led Lorenz to develop the concept of the butterfly effect, which states that small differences in initial conditions can have a huge impact on the long-term behavior of a system.

Robert May was a mathematician fascinated with the idea of chaos. He modeled the population of rabbits using a set of non-linear differential equations, which led to some surprising and complex results. If the initial population of rabbits was slightly larger or the rate of reproduction was slightly higher, the population could explode and lead to a chaotic system.

May’s work with the rabbit population helped to popularize the concept of the butterfly effect, and demonstrate its implications in a simple and intuitive way. May’s work also showed that even simple systems can exhibit seemingly random behavior.

As you would expect, discovering chaos theory caused confusion. At the time, many scientists and mathematicians believed that systems could be predicted and controlled if enough data was collected and analyzed (like Newton’s dream states). This challenged the prevailing view that the world was deterministic and could therefore be understood through mathematical models. It also challenged the notion that systems could be predicted with certainty and raised questions about the limitations of mathematical models and the validity of traditional statistical methods.

Some scientists were skeptical of chaos theory and resisted its ideas, while others embraced it as a new way of thinking about complex systems. Over time, as the field of chaos theory has matured, its ideas have become more widely accepted and its methods have been integrated into various fields, including mathematics, physics, engineering, and economics.

Rabbit JR. Photo by David Solce on Unsplash

2. From a Certain Point, it is Useless to Try to Predict a Chaotic System

In other words, long-term forecasting is doomed for systems with chaotic behavior. You can try to model a chaotic system, but the butterfly effect makes it difficult or impossible to predict accurately in the long-term.

Maybe you are still not convinced, so let’s take a look at some examples. Robert May modeled the rabbit population with the following equation:

The population x at time n is defined. The parameter r determines the growth rate of the population. For different values of r, the population can stabilize or behave chaotic:

Logistic map. For an r-value between 2.4 and 3.0, the population stabilizes. For higher values, there are more possibilities.

In the image you can see that for certain values of r, starting at approximately 3.6, the population x does not stabilize. In other words, there is no model that can predict the rabbit population for these values, because it can be any value.

The second example is about the weather and Edward Lorenz. He found that it’s possible to predict the weather, but one shouldn’t try to predict too far into the future. He showed this by creating two nearly identical models that diverged dramatically after only two weeks due to a relatively small initial disturbance. This shows that chaotic systems are possible to predict, but only until a certain point in time. This ‘point’ differs for every system, and you need to investigate your data and the predictions to discover where the point lies.

By improving the methods of gathering data or modeling, you can try to move that point. In the past, this is done successfully in weather predictions. A modern five-day forecast is more accurate than a one-day forecast in 1980. This is due to improved methods of assimilating observations into models and more extensive observations. Below you see the effect of an additional satellite, launched around the year 2000, to the difference between forecast skill between the northern and southern hemispheres:

The chart shows the evolution of the 12-month running mean of the anomaly correlation, a measure of skill, for 500hPa geopotential height forecasts in the northern and southern hemisphere at various lead times. During the nineties, the gap between the northern and the southern hemisphere narrowed. Since then it has virtually disappeared. Source: ECMWF

The title of this post might suggest you should never try to predict a chaotic system. Actually, there are methods you can try to predict systems that seem chaotic, like neural networks, fractal analysis or state space reconstruction. In the past, some people were successful doing this. But if a system is truly chaotic, it is impossible to predict it with reasonable accuracy in the long-term.

3. Many Systems Behave Chaotic

Chaotic systems can be found in many fields, including physics, biology, economics, and engineering. Many systems behave chaotically, and it can help to be aware of this when working with these types of data as it will make predicting the long-term future extremely hard or impossible. Here are some examples:

Weather and climate
The atmosphere is a complex non-linear system that exhibits chaotic behavior. The temperature or wind direction in one location, can lead to vastly different weather patterns in other locations. This makes long-term weather predictions difficult. The climate system is complex and non-linear, exhibiting chaotic behavior in its interactions between the atmosphere, oceans, and land.
Population dynamics
The populations of certain species of animals or plants can behave chaotically. For example, the population of a predator species may be influenced by the population of its prey.
Economic systems
An example of a chaotic economic system is the stock market. Small changes in interest rates or government policy can lead to different economic outcomes. We can probably assume that if it was possible to predict the stock market, someone would have done it already.
Mechanical systems
A double pendulum, a system of two pendulums attached to each other can behave chaotically.
Biological systems
One of the chaotic systmes in biology is the heartbeat, electrical activity of the heart oscillates between regular and irregular patterns. Another example is the activity of neurons in the brain, they interact with each other in complex and nonlinear ways.
Human behavior
The way people behave is chaotic in many different ways: decision making, opinion dynamics, the behavior of individuals within a crowd and social dynamics can all be influenced by relatively small events.

A real life example of a chaotic system is the Friendly Floatees spill. In 1992, around 29000 rubber duckers spilled overboard from a cargo ship in transit. The distribution of ducks over time and space can be considered as chaotic system. You might expect that the location of the ducks at a given time is predictable, or that the ducks stay relatively close to each other. This wasn’t the case, and the ducks were found virtually everywhere in the world:

Map - Rubber Duck spill and distribution

Rubber ducks everywhere. Source: NOC

The behavior of the ducks after they were released into the ocean was influenced by a variety of factors, including ocean currents, wind patterns, and weather conditions. These factors are highly nonlinear and difficult to predict, and thus caused the ducks to follow unpredictable paths and end up in unexpected locations.

The examples in this section show that chaos is everywhere. It might be a good idea to take this into consideration, especially if you discover that your long-term predictions aren’t as good as expected.

Conclusion

Chaos theory offers valuable insights for data scientists. We have seen that chaotic behavior is ubiquitous in nature, and can be found in a wide range of systems, from weather patterns to the movements of the stock market. While chaotic systems can be difficult to predict or control, they are not entirely random, and can exhibit patterns that are amenable to analysis and modeling.

Data scientists can leverage a range of techniques from chaos theory to better understand and predict the behavior of certain systems. Nonlinear modeling techniques, such as fractal analysis, can help identify patterns in seemingly chaotic data, while machine learning algorithms can be trained to identify and respond to changes. Keep in mind that it is impossible to accurately predict a truly chaotic system in the long term.

Despite the challenges posed by chaotic systems, data scientists can still make significant progress in understanding and predicting their behavior. By embracing the insights of chaos theory, and developing innovative tools and techniques for analyzing complex data, we can unlock new insights and improve our ability to navigate the complex and ever-changing world around us.