What is the probability that the random variable will assume a value between 45 and 55 empirical rule?

The empirical rule, also referred to as the three-sigma rule or 68-95-99.7 rule, is a statistical rule which states that for a normal distribution, almost all observed data will fall within three standard deviations (denoted by σ) of the mean or average (denoted by µ).

In particular, the empirical rule predicts that 68% of observations falls within the first standard deviation (µ ± σ), 95% within the first two standard deviations (µ ± 2σ), and 99.7% within the first three standard deviations (µ ± 3σ).

  • The Empirical Rule states that 99.7% of data observed following a normal distribution lies within 3 standard deviations of the mean.
  • Under this rule, 68% of the data falls within one standard deviation, 95% percent within two standard deviations, and 99.7% within three standard deviations from the mean.
  • Three-sigma limits that follow the empirical rule are used to set the upper and lower control limits in statistical quality control charts and in risk analysis such as VaR.

The empirical rule is used often in statistics for forecasting final outcomes. After calculating the standard deviation and before collecting exact data, this rule can be used as a rough estimate of the outcome of the impending data to be collected and analyzed.

This probability distribution can thus be used as an interim heuristic since gathering the appropriate data may be time-consuming or even impossible in some cases. Such considerations come into play when a firm is reviewing its quality control measures or evaluating its risk exposure. For instance, the frequently used risk tool known as value-at-risk (VaR) assumes that the probability of risk events follows a normal distribution.

The empirical rule is also used as a rough way to test a distribution's "normality". If too many data points fall outside the three standard deviation boundaries, this suggests that the distribution is not normal and may be skewed or follow some other distribution.

The empirical rule is also known as the three-sigma rule, as "three-sigma" refers to a statistical distribution of data within three standard deviations from the mean on a normal distribution (bell curve), as indicated by the figure below.

Image by Julie Bang © Investopedia 2019 

Let's assume a population of animals in a zoo is known to be normally distributed. Each animal lives to be 13.1 years old on average (mean), and the standard deviation of the lifespan is 1.5 years. If someone wants to know the probability that an animal will live longer than 14.6 years, they could use the empirical rule. Knowing the distribution's mean is 13.1 years old, the following age ranges occur for each standard deviation:

  • One standard deviation (µ ± σ): (13.1 - 1.5) to (13.1 + 1.5), or 11.6 to 14.6
  • Two standard deviations (µ ± 2σ): 13.1 - (2 x 1.5) to 13.1 + (2 x 1.5), or 10.1 to 16.1
  • Three standard deviations (µ ± 3σ): 13.1 - (3 x 1.5) to 13.1 + (3 x 1.5), or, 8.6 to 17.6

The person solving this problem needs to calculate the total probability of the animal living 14.6 years or longer. The empirical rule shows that 68% of the distribution lies within one standard deviation, in this case, from 11.6 to 14.6 years. Thus, the remaining 32% of the distribution lies outside this range. One half lies above 14.6 and the other below 11.6. So, the probability of the animal living for more than 14.6 is 16% (calculated as 32% divided by two).

As another example, assume instead that an animal in the zoo lives to an average of 10 years of age, with a standard deviation of 1.4 years. Assume the zookeeper attempts to figure out the probability of an animal living for more than 7.2 years. This distribution looks as follows:

  • One standard deviation (µ ± σ): 8.6 to 11.4 years
  • Two standard deviations (µ ± 2σ): 7.2 to 12.8 years
  • Three standard deviations ((µ ± 3σ): 5.8 to 14.2 years

The empirical rule states that 95% of the distribution lies within two standard deviations. Thus, 5% lies outside of two standard deviations; half above 12.8 years and half below 7.2 years. Thus, the probability of living for more than 7.2 years is:

95% + (5% / 2) = 97.5%

In statistics, the empirical rule states that 99.7% of data occurs within three standard deviations of the mean within a normal distribution. To this end, 68% of the observed data will occur within the first standard deviation, 95% will take place in the second deviation, and 97.5% within the third standard deviation. The empirical rule predicts the probability distribution for a set of outcomes. 

The empirical rule is applied to anticipate probable outcomes in a normal distribution. For instance, a statistician would use this to estimate the percentage of cases that fall in each standard deviation. Consider that the standard deviation is 3.1 and the mean equals 10. In this case, the first standard deviation would range between (10+3.2)= 13.2 and (10-3.2)= 6.8. The second deviation would fall between 10 + (2 X 3.2)= 16.4 and 10 - (2 X 3.2)= 3.6, and so forth. 

The empirical rule is beneficial because it serves as a means of forecasting data. This is especially true when it comes to large datasets and those where variables are unknown. In finance specifically, the empirical rule is germane to stock prices, price indices, and log values of forex rates, which all tend to fall across a bell curve or normal distribution.

The empirical rule in statistics, also known as the 68 95 99 rule, states that for normal distributions, 68% of observed data points will lie inside one standard deviation of the mean, 95% will fall within two standard deviations, and 99.7% will occur within three standard deviations.

Thanks to the empirical rule, the mean and standard deviation become extra valuable when you reasonably expect that your data approximate a normal distribution. Simply knowing these two statistics allows you to calculate probabilities and percentages for various outcomes.

The name of the empirical rule comes from empirical research, which uses observations and measurements of real-world outcomes rather than theory. In other words, empirical means it is grounded in practical reality. The empirical rule takes these recorded outcomes and lets you use them to make forecasts and calculate probabilities.

Additionally, statisticians also refer to the empirical rule as the three-sigma rule because nearly all observations occur within three standard deviations. This rule sets a statistical control chart’s upper and lower limits at +/- three standard deviations. In general, this limit serves as a valuable way to identify outliers because 99.7% of all values should fall within it.

The empirical rule graph below displays the standard normal distribution with the ranges and percentages.

What is the probability that the random variable will assume a value between 45 and 55 empirical rule?

The graph makes it clear why it is also known as the 68 95 99 rule. Those numbers are the percentages that correspond to the standard deviation ranges.

In this post, learn the ways you can use the empirical rule, the formula for calculating the data ranges, and work through examples to solve problems.

Related post: Understanding the Normal Distribution

How to Use the Empirical Rule

Analysts use the empirical rule to predict the probabilities and distributions of the outcomes that they’re studying. It’s a valuable tool because it lets you make predictions using several easy-to-calculate statistics. Verify that your data follow a normal distribution at least roughly. If it does, you can start making forecasts by calculating the mean and standard deviation.

Many organizations use the empirical rule as a quality control method because you can safely assume many variables follow the normal distribution, and it’s easy to calculate the mean and standard deviation. Similarly, the value-at-risk (VaR) financial risk assessment assumes that the probabilities for outcomes follow a normal distribution. In short, the empirical rule is a quick and easy prediction method that provides good results.

Additionally, the empirical rule is an easy way to identify outliers. Because 99.7% of all observations should be within three standard deviations of the mean, analysts frequently use the limit of three standard deviations to identify outliers. Investigate observations outside this limit as potential outliers.

Related post: Five Ways to Identify Outliers

The empirical rule is also a simple normality test. Based on the probabilities, you know that 99.7% of all observations should fall within three standard deviations from the mean. Therefore, only 100 – 99.7 = 0.3% should be outside the limit for a normal distribution. If too many values fall outside this limit, your data might not follow a normal distribution. Using the three-sigma limit of the empirical rule, you’d expect about 1 in every 370 observations to exceed the limit. Consequently, if you have 500 observations and 10 (2%) are outside the empirical rule limit, your data might not be normally distributed.

Outliers vs. Non-Normal Data

As an analyst using the empirical rule, you must distinguish between outliers and non-normal distributions. Both conditions can cause an unusual number of data points to lie outside the three-sigma limit. For example, the observations might be valid but follow a skewed distribution, which can create the appearance of outliers. To sort through this question, you’ll need to evaluate your data carefully, determine how it’s distributed, assess the data points in question, and apply a large amount of subject area knowledge.

Related post: How to Identify the Distribution of Your Data

Empirical Rule Formula (68 95 99 Rule)

To calculate the data ranges associated with the empirical rule percentages of 68%, 95%, and 99.7%, start by calculating the sample mean (x̅) and standard deviation (s). Then input those values into the formulas below to derive the ranges.

Data range Percentage of data in the range
x̅ − s, x̅ + s 68%
x̅ − 2s, x̅ + 2s 95%
x̅ − 3s, x̅ + 3s 99.7%

Example of Using the Empirical Rule

Let’s work through an example problem for the empirical rule using the percentages in its alternate name, the 68 95 99 rule. Assume that a pizza restaurant has a mean delivery time of 30 minutes and a standard deviation of 5 minutes, and the data follow the normal distribution.

Using the empirical rule, we can estimate the range in which 68% of delivery times occur by taking the mean and adding and subtracting the standard deviation (30 +/- 5), producing a range of 25-35 minutes.

Use the same process for the other empirical rule percentages by using the 2X and 3X multiples of the standard deviation. 95% are between 20-40 minutes (30 +/- 2*5), and 99.7% are between 15-45 minutes (30 +/-3*5). The chart below illustrates this property graphically.

What is the probability that the random variable will assume a value between 45 and 55 empirical rule?

Using the 68 95 99 Rule to Calculate Other Percentages

Even though the empirical rule is also known as the 68 95 99 rule, it isn’t limited to only the percentages of 68%, 95%, and 99.7%. Using it creatively, you can figure out other properties. To do that, you’ll need to factor in the properties of the normal distribution. Of particular value are the facts that the normal distribution is symmetrical and centers on the mean.

For example, because the empirical rule states that 95% of the delivery times will be inside the 2X standard deviation range, we know 5% will be outside. Further, the distribution is symmetrical, meaning that 2.5% will be less than 20 minutes and 2.5% will be more. Because we can predict that 2.5% of delivery times will be longer than 40 minutes, we also know that 97.5% will be less than 40 minutes.

Additionally, use the empirical rule to calculate percentiles for particular values. Suppose we wanted to determine the probability of delivery times less than 35 minutes. Using the empirical rule, we know that 68% will fall between 25-35. Because the normal distribution is symmetrical, we know that half of this range (34%) falls above the mean, 30-35 minutes. Additionally, half the entire range of the distribution (50%) falls below the mean (0 -30 minutes). Consequently, we just add those percentages (50% + 34% = 84%) to determine that 84% of deliveries will occur in less than 35 minutes. Equivalently, 35 minutes is the 84th percentile.

Using the empirical rule, can you figure out how to determine the probability of a delivery taking between 35-40 minutes? Think of a creative way to use the 68 95 99 rule!

Alternatively, you can use z-scores to calculate probabilities and percentiles for data that follow the normal distribution. For more information, read my post, Z-score: Definition, Formula, and Uses.

When your data follow a normal distribution, the empirical rule is a valuable statistical tool. But what do you do when your data are not normally distributed? In that case, use Chebyshev’s Theorem! That method provides similar types of results as the empirical rule but for non-normal data.