What is the probability of rolling a 6 on a 6-sided die

I found BruceET's answer interesting, relating to the number of events. An alternative way to approach this problem is to use the correspondence between waiting time and number of events. The use of that would be that the problem will be able to be generalized in some ways more easily.

Viewing the problem as a waiting time problem

This correspondence, as for instance explained/used here and here, is

For the number of dice rolls $m$ and number of hits/events $k$ you get: $$\begin{array}{ccc} \overbrace{P(K \geq k| m)}^{\text{this is what you are looking for}} &=& \overbrace{P(M \leq m|k)}^{\text{we will express this instead}} \\ {\small\text{$\mathbb{P}$ $k$ or more events in $m$ dice rolls}} &=& {\small\text{$\mathbb{P}$ dice rolls below $m$ given $k$ events}} \end{array} $$

In words: the probability to get more than $K \geq k$ events (e.g. $\geq 1$ times rolling 6) within a number of dice rolls $m$ equals the probability to need $m$ or less dice rolls to get $k$ such events.

This approach relates many distributions.

Distribution of Distribution of Waiting time between events number of events Exponential Poisson Erlang/Gamma over/under-dispersed Poisson Geometric Binomial Negative Binomial over/under-dispersed Binomial

So in our situation the waiting time is a geometric distribution. The probability that the number of dice rolls $M$ before you roll the first $n$ is less than or equal to $m$ (and given a probability to roll $n$ equals $1/n$) is the following CDF for the geometric distribution:

$$P(M \leq m) = 1-\left(1-\frac{1}{n}\right)^m$$

and we are looking for the situation $m=n$ so you get:

$$P(\text{there will be a $n$ rolled within $n$ rolls}) = P(M \leq n) = 1-\left(1-\frac{1}{n}\right)^n$$

Generalizations, when $n \to \infty$

The first generalization is that for $n \to \infty$ the distribution of the number of events becomes Poisson with factor $\lambda$ and the waiting time becomes an exponential distribution with factor $\lambda$. So the waiting time for rolling an event in the Poisson dice rolling process becomes $(1-e^{-\lambda \times t})$ and with $t=1$ we get the same $\approx 0.632$ result as the other answers. This generalization is not yet so special as it only reproduces the other results, but for the next one I do not see so directly how the generalization could work without thinking about waiting times.

Generalizations, when dices are not fair

You might consider the situation where the dice are not fair. For instance one time you will roll with a dice that has 0.17 probability to roll a 6, and another time you roll a dice that has 0.16 probability to roll a 6. This will mean that the 6's get more clustered around the dice with positive bias, and that the probability to roll a 6 in 6 turns will be less than the $1-1/e$ figure. (it means that based on the average probability of a single roll, say you determined it from a sample of many rolls, you can not determine the probability in many rolls with the same dice, because you need to take into account the correlation of the dice)

So say a dice does not have a constant probability $p = 1/n$, but instead it is drawn from a beta distribution with a mean $\bar{p} = 1/n$ and some shape parameter $\nu$

$$p \sim Beta \left( \alpha = \nu \frac{1}{n}, \beta = \nu \frac{n-1}{n} \right)$$

Then the number of events for a particular dice being rolled $n$ time will be beta binomial distributed. And the probability for 1 or more events will be:

$$P(k \geq 1) = 1 - \frac{B(\alpha, n + \beta)}{B(\alpha, \beta)} = 1 - \frac{B(\nu \frac{1}{n}, n +\nu \frac{n-1}{n})}{B(\nu \frac{1}{n}, n +\nu \frac{n-1}{n})} $$

I can verify computationally that this works...

What is the probability of rolling a 6 on a 6-sided die

### compute outcome for rolling a n-sided dice n times rolldice <- function(n,nu) { p <- rbeta(1,nu*1/n,nu*(n-1)/n) k <- rbinom(1,n,p) out <- (k>0) out } ### compute the average for a sample of dice meandice <- function(n,nu,reps = 10^4) { sum(replicate(reps,rolldice(n,nu)))/reps } meandice <- Vectorize((meandice)) ### simulate and compute for variance n set.seed(1) n <- 6 nu <- 10^seq(-1,3,0.1) y <- meandice(n,nu) plot(nu,1-beta(nu*1/n,n+nu*(n-1)/n)/beta(nu*1/n,nu*(n-1)/n), log = "x", xlab = expression(nu), ylab = "fraction of dices", main ="comparing simulation (dots) \n with formula based on beta (line)", main.cex = 1, type = "l") points(nu,y, lty =1, pch = 21, col = "black", bg = "white")

.... But I have no good way to analytically solve the expression for $n \to \infty$.

With the waiting time However, with waiting times, then I can express the the limit of the beta binomial distribution (which is now more like a beta Poisson distribution) with a variance in the exponential factor of the waiting times.

So instead of $1-e^{-1}$ we are looking for $$1- \int e^{-\lambda} p(\lambda) \, \text{d}\, \lambda$$.

Now that integral term is related to the moment generating function (with $t=-1$). So if $\lambda$ is normal distributed with $\mu = 1$ and variance $\sigma^2$ then we should use:

$$ 1-e^{-(1-\sigma^2/2)} \quad \text{instead of} \quad 1-e^{-1}$$

Application

These dice rolls are a toy model. Many real-life problems will have variation and not completely fair dice situations.

For instance, say you wish to study probability that a person might get sick from a virus given some contact time. One could base calculations for this based on some experiments that verify the probability of a transmission (e.g. either some theoretical work, or some lab experiments measuring/determining the number/frequency of transmissions in an entire population over a short duration), and then extrapolate this transmission to an entire month. Say, you find that the transmission is 1 transmission per month per person, then you could conclude that $1-1/e \approx 0.63 \%$ of the population will get sick. However, this might be an overestimation because not everybody might get sick/transmission with the same rate. The percentage will probably lower.

However, this is only true if the variance is very large. For this the distribution of $\lambda$ must be very skewed. Because, although we expressed it as a normal distribution before, negative values are not possible and distributions without negative distributions will typically not have large ratios $\sigma/\mu$, unless they are highly skewed. A situation with high skew is modeled below:

Now we use the MGF for a Bernoulli distribution (the exponent of it), because we modeled the distribution as either $\lambda = 0$ with probability $1-p$ or $\lambda = 1/p$ with probability $p$.

What is the probability of rolling a 6 on a 6-sided die

set.seed(1) rate = 1 time = 1 CV = 1 ### compute outcome for getting sick with variable rate getsick <- function(rate,CV=0.1,time=1) { ### truncating changes sd and mean but not co much if CV is small p <- 1/(CV^2+1) lambda <- rbinom(1,1,p)/(p)*rate k <- rpois(1,lambda*time) out <- (k>0) out } CV <- seq(0,2,0.1) plot(-1,-1, xlim = c(0,2), ylim = c(0,1), xlab = "coefficient of variance", ylab = "fraction", cex.main = 1, main = "if rates are bernouilli distributed \n fraction p with lambda/p and 1-p with 0") for (cv in CV) { points(cv,sum(replicate(10^4,getsick(rate=1,cv, time = 1)))/10^4) } p <- 1/(CV^2+1) lines(CV,1-(1-p)-p*exp(-1/p),col=1) lines(CV,p, col = 2, lty = 2) legend(2,1, c("simulation", "computed", "percent of subsceptible population"), col = c(1,1,2), lty = c(NA,1,2), pch = c(1,NA,NA),xjust =1, cex = 0.7)

The consequence is. Say you have high $n$ and have no possibilities to observe $n$ dice rolls (e.g. it takes to long), and instead you screen the number of $n$ rolls only for a short time for many different dice. Then you could compute the number of dices that did roll a number $n$ during this short time and based on that compute what would happen for $n$ rolls. But you would not be knowing how much the events correlate within the dice. It could be that you are dealing with a high probability in a small group of dice, instead of an evenly distributed probability among all dice.

This 'error' (or you could say simplification) relates to the situation with COVID-19 where the idea goes around that we need 60% of the people immune in order to reach herd immunity. However, that may not be the case. The current infection rate is determined for only a small group of people, it can be that this is only an indication for the infectiousness among a small group of people.