Comparing Data Sets Using Statistical Analysis in Excel

Are you ready to try something really challenging with Excel? Can you handle basic statistical analysis, but find you want or need to dig more deeply into the data set? Keep reading; this three-part series will show you how to extract even more information from those numbers by using the principles of inferential statistics.

We have previously discussed doing statistical analysis with MS Excel. It explains how to use Excel to analyze for descriptive statistics. If you read it, you learned how to create a histogram, do some scatter plots and minor regression analysis.

This tutorial advances the use of MS Excel to work with inferential statistics. This branch of statistics deals with the comparison of quantitative data sets and making sound judgment with the use analysis tools.

Inferential statistics is often ignored in the business and engineering world. The primary reason is the complexity of the analysis involved; it requires both a basic and an advanced knowledge of statistical analysis. It is true that not all firms can hire or afford a major statistics graduate. Thanks to Microsoft Excel, with the built-in tools available and by following this tutorial, you can analyze complex sets of data and draw inferences from them just like any professional statistician.

But first, before we go deeper, you can expect to achieve the following from reading this tutorial: 

  • Learn the data analysis tools available in MS Excel that can be used for data comparison and doing inferential analysis.
  • Learn the basic statistical techniques used to compare the means and standard deviation of any set of data.
  • Learn how to implement these statistical techniques in MS Excel.

The Requirements

First, you need MS Excel installed on your computer. This program comes with Office applications such as MS Word, MS PowerPoint etc. One of the things you need to install is the Data Analysis ToolPak. This is an add-on for Microsoft’s Excel package.

Before installing this add-on, you can check to see if it is installed already. You can check the menu under Tools, where you should find Data Analysis. If you cannot find it there, then you need to install it.

You need the Office Installer CD or any disk that you use when you install MS Excel for the first time. If you cannot find one, then you can contact Microsoft for support.

Place the installer CD on the CD drive or connect it to your computer. Exit if any auto play message appears, and then go to Tools -> Add-in. Depending on the Excel versions installed on your computer, you should check either "Analysis toolpak" or "Analysis toolpak (VBA)." Then click OK. If installation message occurs, browse to the disk and continue the installation.

You may need to restart your Excel application after this. However, this tutorial has been written in such a way that even if you cannot install Data Analysis Toolpak (which will be covered in part two of this tutorial), you can still perform the statistical analysis discussed in this article using built-in functions.

{mospagebreak title=A Basic Statistics Lesson}

In order to maximize what you learn from this tutorial, I will provide a short background regarding statistical analysis.

Inferential statistics is the most underestimated (and underutilized) statistical tool in the engineering and business industry. Some managers or even engineers resort to hunches or guesses based on the numerical data available without using valid statistical techniques to test it.

Commonly, there are two major ways to compare data:

  • Compare the means of the two or more data sets. For example, you can run a paired experiment. This is an experiment where the subject is the same but is subjected to different test conditions. A classic example is a single rat subjected to two different medications in an experiment designed to find out the effect of each medication.
  • Compare the variation of two or more data sets. In most comparative studies, the mean is the most important piece of data. But there are also times when the variation is also studied. This is particularly useful in studying the stability of manufacturing processes, for example.

In statistical literature, a T-test is used to compare the means of two datasets. ANOVA (Analysis of variance) is used to compare the means of two or more data sets. An F-test is used to compare the variation of two data sets.

In MS Excel, the following are the available tools for both built-in and data analysis toolpak features:

  • T-test
  • ANOVA
  • F-test

WARNING: In any scientific research, the accuracy of the measurement system is extremely important. So make sure you have validated the accuracy of your measurement system before proceeding with the gathering of data and doing Excel analysis.

So if you are ready, let us proceed to case studies (T-test and F-test). Advanced statistical techniques for comparison like ANOVA will be covered in a separate article because they are broad and have a variety of applications.

{mospagebreak title=How to conduct a T-test in MS Excel}

A T-test is used to compare two data sets. In statistical literature, below are the requirements with which you should comply before you can conduct a T-test:

  • You obtain the data using a random sampling method. For example, if you want to study the whole population of the US, you should be gathering random samples of people in all provinces, not just one state.
  • The data should be quantitative. It is not acceptable to do a T-test if you cannot quantify your data. If you can get a much higher resolution of your data (for example, by using decimal numbers), it is much better.
  • The sample size should be at least 20 for better experimental accuracy.

There are two ways you can analyze a T-test. Let’s discuss the first one, which does not use the Data Analysis add-on.

To illustrate, let’s use realistic data. Below are the data taken from a watt-hour meter of a certain house, subjected to two conditions. The first column is the energy consumption with the use of an energy-conserving device, while the second column shows energy consumption without the energy-conserving device.

The data were taken on 20 random days in a month using the power supplier’s watt-hour meter. As with all scientific experiments, we will start with formulating a hypothesis or two, which are as follows:

Null Hypothesis: There is no effect on the power consumption measurements even if the energy-conserving device has been added to the system. In other words, the means of the two samples are the same and the energy conserving device is not effective.

Alternative Hypothesis: There is a significant difference in the mean. The mean of energy consumption data using the energy conservation device is significantly lower than without it.

Rules for Rejecting the Null Hypothesis: If P value is less than 0.05 (95% confidence level of the results), then the difference is significant; otherwise, accept the null hypothesis.

This is an example of a paired T-test. To do this in Excel, below is the T-test syntax:

TTEST(array1,array2,tails,type)

Array 1 and array 2 are respectively the tables to be analyzed. Tails are the distribution type for analysis: set it to 2 for default and simplicity in the analysis. The type can be:

1- For Paired t-test

2- Two-sample equal variance

3 – Two-sample unequal variance

To use type 2 and 3, you should compare the variance first (using an F-test) before doing the T-test.

So the final formula is: TTEST(B2:B21,C2:C21,1,1)

This gives the results of 0.004; this is called the P-value in statistics.

Since it is less than 0.05, we will reject the null hypothesis and conclude that:

Mean with Energy Conserving Device: 12.189 Kilowatt hour <<< Mean without Energy Conserving Device: 13.644 Kilowatt hour

“<<<” means significantly smaller than.

In other words, we can say that the energy conserving device saves electricity; therefore, we can recommend that it be implemented to cut electricity costs.

{mospagebreak title=How to conduct an F-test in MS Excel}

An F-test will check whether or not the two data sets have the same variation. It is also necessary to do an F-test when comparing the means of two samples not coming from a paired experiment.

Let us use the data in the screen shot on the previous page and assume they are not paired, but coming from two houses. The first house uses the energy- conserving device, while the other does not.

Null Hypothesis: The variation of the two samples coming from the first and second house is the same.

Alternative Hypothesis: The variation of the two samples is not equal; they have different standard deviations.

In statistics, standard deviation is the measurement of variation. In an F-test it compares the standard deviation of two samples instead of the mean (which is done in a T-test). Actually, what is being analyzed is the “variance,” which is the square of standard deviation.

Rules for rejecting the null hypothesis: If P value is less than 0.05, then the variance or the standard deviations of the two sample sets are not equal. It is significantly different.

In Excel, the F-test syntax is: FTEST(B2:B21,C2:C21) , based on the earlier screen shot of the Excel sheet.

Therefore, the P-value is 0.925, which is greater than 0.05. Hence, we cannot reject the null hypothesis and conclude that the variations of the two samples are the same.

If you want to conduct the comparison of means, then use the “Two Sample Equal Variance” T-test.

3 thoughts on “Comparing Data Sets Using Statistical Analysis in Excel

  1. The SAT is a standardized exam which measure’s a student’s readiness to take on college-level coursework. Unlike the ACT, SAT Reasoning Tests are not based on a student’s learned concepts from their previous high school education.

  2. Fruit flies are commonly associated with fermenting fruits and vegetables. They easily develop near the over-ripe fruits or in the bottom of garbage cans, in dirty mop, or on rotting potato or onion in the vegetable bin. You can use a fly machine and aerosols to get rid of fruit flies.

  3. The SAT is a standardized exam which measure’s a student’s readiness to take on college-level coursework. Unlike the ACT, SAT Reasoning Tests are not based on a student’s learned concepts from their previous high school education.

[gp-comments width="770" linklove="off" ]