Bayesian Statistics in Excel

I will show you how to perform Bayesian analysis in Excel using simple formulas.

First, you need to define your prior distribution, which represents your initial assumptions about the parameter of interest. For example, if you want to estimate the proportion of customers who prefer product A over product B, you can use a beta distribution as your prior. A beta distribution has two parameters, alpha and beta, that control its shape and location. You can choose these parameters based on your prior knowledge or experience, or use some default values like alpha = 1 and beta = 1, which give a uniform distribution.

Next, you need to collect some data from your customers and calculate the likelihood function, which represents the probability of observing the data given the parameter. For example, if you survey 100 customers and find that 60 of them prefer product A, you can use a binomial distribution as your likelihood function. A binomial distribution has two parameters, n and p, where n is the number of trials and p is the probability of success. In this case, n = 100 and p is the parameter we want to estimate.

Finally, you need to combine your prior distribution and your likelihood function to obtain the posterior distribution, which represents your updated beliefs about the parameter after seeing the data. To do this, you need to use Bayes’ theorem, which states that:

posterior ∝ prior × likelihood

In Excel, you can use the BETA.DIST function to calculate the beta distribution, the BINOM.DIST function to calculate the binomial distribution, and the SUMPRODUCT function to calculate the posterior distribution. Here are the steps:

  1. 1. Create a table with two columns: p and prior. In the first column, enter values for p from 0 to 1 with an increment of 0.01. In the second column, enter the formula =BETA.DIST(p,alpha,beta,FALSE) for each value of p, where alpha and beta are your chosen parameters for the prior distribution.
  2. Create another table with two columns: p and likelihood. In the first column, enter the same values for p as in the previous table. In the second column, enter the formula =BINOM.DIST(60,n,p,TRUE) for each value of p, where n is the number of trials (100 in this example).
  3. Create a third table with two columns: p and posterior. In the first column, enter the same values for p as in the previous tables. In the second column, enter the formula =SUMPRODUCT(prior,likelihood)/SUM(likelihood) for each value of p, where prior and likelihood are the ranges of cells containing the prior and likelihood values from the previous tables.
  4. Create a chart with p on the x-axis and posterior on the y-axis. This will show you the shape of your posterior distribution and how it differs from your prior distribution.
  5. To find the mean and standard deviation of your posterior distribution, you can use the AVERAGE and STDEV functions on the posterior column. To find a credible interval for your parameter, you can use the PERCENTILE function on the posterior column with different percentiles (e.g., 0.025 and 0.975 for a 95% credible interval).
See also  How to Add Tax in Excel

The posterior distribution is more peaked and shifted to the right than the prior distribution, indicating that we have learned from the data that product A is more likely to be preferred by customers than product B. The mean of the posterior distribution is 0.61, which means that our best estimate for p is 61%. The standard deviation of the posterior distribution is 0.05, which means that we have some uncertainty about this estimate. The 95% credible interval for p is (0.51, 0.71), which means that we are 95% confident that p lies within this range.