Doing Statistical Analysis with MS Excel

MS Excel can be used for doing statistical analysis. That much is common knowledge, yet few people seem to know how to do it. If the very thought of using MS Excel in this way seems challenging to you, keep reading. This article will walk you through performing two kinds of important statistical analysis tasks with this software.

Doing statistical analysis using MS Excel seems to be difficult for a beginner and even an intermediate user of MS Excel. The primary reason is that it is not taught in most MS Excel classes or even in the in-house seminars. Also, doing statistical analysis requires you to learn the associated Excel functions in order to perform the specific statistical analysis task.

In this article I will illustrate how to perform the most important statistical analysis using MS Excel. Below are the important statistical analysis tasks which scientists, engineers and even those in the financial sectors should be performing:

  • Making a histogram

  • Performing a correlation plot and regression analysis

Some of the MS Office packages do not include the magical Data Analysis tool pack. This makes it even harder for an analyst to do some basic or advanced statistical analysis. Data Analysis is an add-on to MS Excel, but some installations might not include it. If yours doesn’t, obviously, attempting to use the Data Analysis tool pack will not work. 

Fortunately, the examples illustrated in this article do not depend on the Data Analysis tool pack. You will not need it to get them to work; any analysis can be done without the tool pack. This tutorial has been tested to work in MS Excel 2002.

Let us start with making a histogram.

{mospagebreak title=How to make a histogram using MS Excel}

A histogram is a plot that shows the frequency of data sets. This is an important tool used to analyze the distribution of data. By graphically analyzing the histogram, you can tell where the mean may be located. You can even tell the variation of the samples, or if the data are skewed to the right or left instead of following the normal curve (the mean at the center of the distribution).

To do this using MS Excel, say for example that you have the following data:

37.2, 37.8, 37.1, 37.8, 36.2, 36.5, 36.6, 36.3, 36.5, 36.6, 36.2, 36.9, 37.1, 37.6, 37.3, 36.1, 36.5, 36.3, 35.8

Step 1. First, open your MS Excel, and then position/copy the data above vertically, starting on cell A1 with data = 37.2 and continuing to cell A20 with data = 35.8

Step 2. Compute the minimum and maximum values of the data set. To do this, in cell A21, paste this formula: =MIN(A1:A20), then on cell A22, paste this formula: =MAX(A1:A20)

The minimum and maximum values of the data set are needed to make a data bin for the histogram.

Step 3. Compute the bin statistics. The bin statistics are the values required to make the graph.

Increment value = (Max value – Min value) / Number of samples

Starting value = (Min value – Increment value)

Ending value = (Max value + Increment value)

In this example, you can use cell A23 for the increment value, cell A24 for the starting value and finally cell A25 for the maximum value.

The starting value should be less than the minimum value of the data, so that the minimum value can be seen in the plot. The same logic applies to ending value. In this example, the following are the computed values:

Minimum value = 35.8

Maximum value = 37.8

Increment value = 0.1

Starting value = 35.7

Ending value = 37.9

{mospagebreak title=Making a histogram continued}

Step 4. Make the bin data table using the bin statistics. Since the actual data table has been placed in Cell A1 to Cell A20, the bin table should be placed starting on cell B1 to B23. See below:

35.7

35.8

35.9

36

36.1

36.2

36.3

36.4

36.5

36.6

36.7

36.8

36.9

37

37.2

37.3

37.4

37.5

37.6

37.7

37.8

37.9

The bin table should be arranged like the one above.

Step 5. If you are done with the bin table, you can compute the FREQUENCY. This will count the number of data occurrences within the specific bin data range. In our example, use cell C1 to place this frequency formula:

=frequency(A1:A20,B1:B23)

Then select (selecting the cells will highlight the cells) starting with cell C1 and going down to cell 23. Then follow the steps below:

  • Press F2. 
  • Press Ctrl – Shift (together). 
  • Press Enter.

The procedure above will produce the frequency data table in column C. Look at the screen shot below; your analysis in your Excel spreadsheet should be similar (particularly in column C where the frequency data table is computed).

We’ve almost finished; we still have to plot the frequency vs. the actual data bin table.

Step 6. To generate the histogram plot, follow the simple steps below:

  • Go to Insert -> Chart -> Column (this is the chart type), and then click NEXT.   
  • Click the "SERIES" tab, and remove Series 1 and Series 2, because they are redundant data in the histogram.   
  • Click on the "Category (X-axis)," and then select B1 to B23, which is the bin range data.   
  • Finally, for the name, you can change it from "Series3" to "histogram."

Now you have the histogram plot generated.

{mospagebreak title=How to make a correlation plot and regression analysis}

A correlation plot/regression analysis is another important data analysis tool that can determine the degree of relationship between two variables. For example, say you have the following data:

We would like analyze the relationship between Data1 and Data2.

Below are the steps you need to take to make the correlation plot:

Step 1: Paste/Construct the data table above, starting in cell A1 to cell C11.

Step 2: Click "Insert" -> "Chart" -> "XY Scatter," and then click Next.

Step 3: In the Data range, select cells B2 to C11, and then click "Finish."

The scatter plot will now be generated. To find out the model of the regression analysis plot, click on any dot in the scatter plot, right click on it, and then click "Add Trend line."

Step 4: Finally, select any regression type of your choice. For example, we will select the "linear" type regression. To display the model/equation in the graph, click the "Options" tab, and then click "Display Equation on Chart" and also "Display R-squared value on chart."

To interpret the analysis plot, R squared (in statistics this is called coefficient of determination) of 0.989 means that Data 1 and Data 2 are 98.9% related (this percentage indicates how well they fit each other). 

Conclusions

MS Excel is a highly powerful spreadsheet that offers statistical analysis tools to make histograms and perform correlation/regression analysis. The above tutorial shows that it can be done without installing the Data Analysis tool pack.

The data used in this article was purely for example purposes; you can have the data on your own and make your own statistical analysis. In upcoming articles I will discuss another set of statistical analysis such as comparing the means of two data sets as well the variation.

If you are in the engineering/manufacturing sector, or a beginner in the field of statistics, you will find this tutorial very helpful.

One thought on “Doing Statistical Analysis with MS Excel

  1. Require update for Excel 2010, however with a some difficulty I was able to perform it on Excel 2010

Join the conversation
about this article

View thread