Explain why researchers cannot claim causation when examining the relationship between variables.

Released - May 10, 2017

It's often very tempting to look at statistical information, spot correlation, and then assume causation. It's a mistake that gets made often, but things are rarely this simple or straightforward.

Of course, circumstances can be that straightforward occasionally, but assuming that they are is never a good idea because you will often jump to the wrong conclusions.

Just because correlation is evident, that doesn't mean that A causes B.  In statistics, it's a logical fallacy to suggest that correlation proves causation, and no one will take you seriously if your research falls into this trap.

There are many variables must be examined when looking the relationship between two events. You will usually find other factors that have an impact, and it might be these factors that are responsible for the correlation.

“Correlation Does Not Imply Cause” - But What Does That Actually Mean?

It seems pretty self-explanatory, but it's not always easy to understand exactly what this phrase means until you examine it carefully. First of all, it is important to understand what a correlation is and what a causation is. A correlation is a mutual relationship or a connection between two variables. Causation is the relationship between cause and effect. So, when a cause results in an effect, that's a causation.  In other words, correlation between two events or variables simply indicates that a relationship exists, whereas causation is more specific and says that one event actually causes the other.

When we say that correlation does not imply cause, we mean that just because you can see a connection or a mutual relationship between two variables, it doesn't necessarily mean that one causes the other.  Of course, it might be the case that one event or variable causes the other, but we can't know that by looking at the correlation alone. More research would be necessary before that conclusion could be reached.

Why is the Relationship Between Correlation and Causation Important?

Why is the relationship between these two things important? To put it simply, we often need to know if one event or variable causes another when we carry out research. For example, if we want to find out if a drug is having a positive effect on a patient, we need to understand the causality. If a patient gets better after taking a certain drug, was it the drug that caused the improvement? Or was something else happening?

This is just one example of how cause and effect relationships can be important when research is being carried out, but there are many other examples that illustrate the same point. Any situation in which an outcome from a process has to be analyzed will deal with cause and effect relationships. This isn't just important in medicine or science. It can be used in social research, political science, and other areas. 

The Lack of Correlation Doesn't Imply the Lack of Causation 

In the same way that a correlation does not imply a causation, it can also be said that a lack of correlation doesn't imply a lack of causation. It has to work both ways. This might seem like a strange point to make. Many people assume that correlation is the minimum required, and then other forms of analyses have to be applied. But it's important to remember that truth is independent, and something doesn't become less truthful because of our inability to measure it. 

If something happened and a cause occurred, then it happened whether we have a way of measuring the causation or not. Just because you can't see a correlation, that doesn't mean that there wasn't some kind of causation at play. It's vital not to fall into the trap of forgetting about this issue because it can be very important.

How Should Causation Be Established?

It's not easy to measure and establish causation, and there is no set path that will guarantee an easy way to test it. It all depends on the situation at hand and what kind of causal relationship needs to be tested. Of course, you can't just assume that correlation implies causation; we've covered that already. But when you find correlation, it can be an indication to examine the situation further to determine if causation can be established between the variables.

Thorough testing and the elimination of variables that could impact findings can help test the hypothesis. If other factors can be eliminated that could be causing the appearance of correlation, the evidence for causation of the remaining variables could be strengthened. Bradford Hill's criteria for causation will also help you to identify if a causal relationship is present. He lists nine criteria that help to identify causation.  Hill's criteria for causation in biological research is strength, consistency, specificity, temporality, biological gradient, plausibility, coherence, experiment, and analogy.

Examples of How Causation May Be Wrongly Inferred from Correlation

Now we have discussed correlation and causation and how they are related, we will discuss the different ways in which causation can be wrongly inferred from correlation. 

Reverse Causation

Reverse causation is as simple as its name. When you observe a correlation, it's possible to interpret it in the wrong way. Instead of seeing that A causes B, you might assume that B causes A. It's easy to get these things mixed up, but when you put it in simple terms and use a very basic and obvious example, you can see that the causation that you've identified is incorrect. It's one of the most common ways to incorrectly infer causation from correlation.

Consider how a solar panel works and we can see how reverse causation happens. When a solar panel generates more power, the sun is visible in the sky for longer. But that doesn't mean that the solar panel's increase in power generation causes the sun to stay in the sky for longer. Instead, the reverse is actually true. The sun was visible in the sky for longer, and this is what led to the solar panel producing more power during that period of time.

The Common-Causal Variable

When we talk about a common-causal variable, we are referring to a variable that is hidden and lurking in some way. We might not be able to see it or the impact that it's having, but it can skew our view of a particular correlation. The fallacy here is looking at the correlation and assuming causation without taking into account the active variable that is actually causing the correlation that we can see.

One example of how this works is during a hot summer. If there is a hot summer and a particular politician got elected as the leader of the country during this summer, and ice cream sales also break records, there are three things happening. If the hot summer is ignored, one could incorrectly see a causal relationship between the election of a politician and an increase in ice cream sales. In fact, the hot weather is the common variable that accounts for the increase in ice cream sales.

Coincidental Relationships

Sometimes things happen by pure coincidence. If two things happen at the same time, it doesn't necessarily prove that they have a direct link. This is different from a common-causal relationship because there isn't necessarily one variable that is skewing the results. It can be more complex than that and the correlation you're seeing can be a complete coincidence, even if the pattern occurs repeatedly over years or decades.

This is how many conspiracy theories are generated. If it's suggested that a terrorist attack often occurs days after the police carry out training, that doesn't mean that there is a link between these events. Conspiracy theorists extrapolate even more nonsense and use it to suggest proof of undercover operations and inside jobs when these things simply aren't happening. That's why you have to be careful and examine all variables when determining cause and effect relationships.

Bidirectional Causation

Bidirectional causations apply to situations where the two things being looked at have an effect on one another. If there are two things happening and the causation goes both ways, it can be easy to misinterpret the data and view it as a more conventional form of cause and effect. The most obvious example of this is in the animal kingdom. The number of prey that predators eat directly affects the number of predators that can survive. But at the same time, if the number of predators increase, the number of prey will decrease because more of them will become food for the predators.

Although correlation and causation can sometimes be linked, it's not enough to prove causation. As the examples above illustrate, this is often not a sound conclusion to reach.  When assembling evidence for a causal relationship between two variables, it is critical to consider as many variables as possible and eliminate them in a systematic.  The variables that remain are the most likely candidates to be the cause of the event being studied.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.