Chi Square Test of Independence with MS Excel

One of the most important aspects of any type of research that includes surveys and scientific experiment is having an “unbiased” form of decision. Only when you arrive at an unbiased decision can your research be considered to have yielded "facts." Yet lots of managers, engineers and marketers fail to produce facts even with numerical data presented and analyzed by MS Excel. Fortunately, the Chi Square statistical tool can help.

In this article, you will learn how to use the Chi Square Test of Independence (CSTI) in MS Excel. This is one of the most important statistical tools that can guide any researcher to an unbiased or factual decision.

Facts vs. Inferences

If you are presented with a data table like the one below:

(Data source: http://davidmlane.com/hyperstat/B150300.html)

Say someone asks you whether the drug has some effect on chess players’ performance. What you will do is make some educated guess based on the results.

Example:

Calculating the %win under the condition of drug use:

% win = 12/40 = 30%

Compute the %win for those normal players (not using the drug):

% win = 13/30 = 43%

And respectively for the % Loss:

Using drug = 18/40= 45%

Without drug = 7/30= 23%

Finally for % Draw:

Using drug = 10/40= 25%

Not using drug= 10/30= 33%

Then you might say there is no positive effect because the %win is not good compared to the control group; however, what if someone asked if 43% is really a significant difference when compared to 30%, or is this just random chance? This is where analysis could get confusing due to lack of a systematic approach to problem solving.

This is where the Chi Square test of independence is very helpful.

{mospagebreak title=Setting up Contingency Tables and the Requirements}

Aside from having MS Excel on your desktop, you need to comply with the following requirements to conduct a Chi Square Test of Independence (CSTI) analysis:

1. This test can only be applied to two categorical variables. Our example screen shot in the previous section has two categorical variables, namely the “condition” and the “game results.”

Condition has the following qualitative values: Experimental Group and Control Group.

Game results have these qualitative values: Win, Loss and Draw.

This is entirely different from a “quantitative variable” which has quantitative values. For example, “population” is a quantitative variable, as is “ the weight of the person.”

2. The total sample size should be more than 50. This is a safe rule of thumb: http://faculty.chass.ncsu.edu/garson/PA765/chisq.htm

3. Transfer the results and tabulate them in a contingency table format (like the previous screen shot). The last rows and columns are the totals of the contingency table values.

4. You can have as many columns or rows as you want, but you are limited to only two categorical variables.

Before doing any sort of analysis or data gathering, it is wise to formulate the hypothesis. You can either state it like the one shown below:

Null hypothesis: There is no difference between the Experimental and Control Groups. The drug is not contributing any significant effect.

Alternative hypothesis: There is substantial difference between the Experimental and Control groups. The drug is significantly contributing to the effect.

You can then state the rules for rejecting the null hypothesis. Below is the standard:

Rules: Reject the null hypothesis if the p-value is less than 0.05, that is, a 95% confidence level.

P-value is the objective of the analysis to be shown later.

{mospagebreak title=Computation of Expected Values and Setting in Excel}

Once you have data in Excel, you can start the analysis by computing the “expected values.” These values are a requirement for computation. The expected value in any cell is the:

EV = Product of the row total and the column total/grand total or the total sample size of the experiment

An easier way to visualize the computation is by using the table below. The expected value formulas are placed on gray shaded cells:

Using the data table in the first screen shot, compute the expected value of “Experimental Group: Applied in Drugs” under the “Win” column. The formula based on the above screen shot will be:

Expected value = (totalcol1 x totalrow1)/ grandtotal

E.V = (25 x 40)/70 = 14.286

You can perform the same computation on the rest of the cells using the guided formulas for the expected value. MS Excel suggested that the data tables for the actual and expected values be presented like the one below:

The actual table shows the actual data summary gathered by the experiment, and the expected values data table is computed based on the formulas provided.

So, based on the screen shot, the actual data covers this range: B3:D4, while the expected data table covers this range: B8:D9. You will need these cell range values to use the MS Excel stat function CHITEST: http://office.microsoft.com/en-us/excel/HP052090121033.aspx

{mospagebreak title=Computing the P-value in Excel and Drawing Conclusions}

With all of the data set up, it is now the time to compute the p-value, since you need this value in decision making.

For example, you will place the p-value in cell E9, type this command:

=CHITEST(B3:D4,B8:D9)

The result of this function is the p-value of the chi square test of independence. The resulting value is 0.172.

Since this result is NOT less than 0.05, we cannot reject the hypothesis and conclude that:

“There is no difference between the Experimental and Control Group. The drug is not contributing any significant effect.”

Other Examples of Chi Square Test of Independence

Below is a good exercise to test your skills in doing CSTI analysis in MS Excel:

Source: http://stattrek.com/AP-Statistics-4/Independence.aspx?Tutorial=AP

The objective of the study is to determine if there is a significant relationship between gender and voting preferences. Take note that they are the categorical variables.

The following are the hypothesis:

Null: Gender and voting preferences are not related or they are independent.

Alternative hypothesis: Voting preferences and gender are not independent.

Rules: Reject null hypothesis if p-value is computed to be less than 0.05 using 95% confidence level.

Plugging in the data and analyzing it using MS Excel, the actual and expected values table are shown below:

Using the CHITEST function, the p-value is computed as 0.0003, which is less 0.05. Therefore, the null hypothesis will be rejected, and we’ll accept the alternative hypothesis, which is: “Voting preferences and gender are not independent.”

In other words, a person’s voting preferences are influenced by their gender.

You can download the sample Excel spreadsheet used to conduct this analysis here: http://cid-c3bc6a3c5463e218.skydrive.live.com/self.aspx/.Public/Chi%20Square%20Test%20of%20Independence%20Sample%20Worksheet.xls

Other important references pertaining to this test

You can read the following references for further information pertaining to the Chi Square test of independence:

http://udel.edu/~mcdonald/statchiind.html (more theory)

http://www.ithaca.edu/faculty/alynn/Chi%20square%20independence.pdf (more practice examples)

http://udel.edu/~mcdonald/statchiind.xls (excel template on CSTI)

They provide some examples, theories and Excel templates to further enhance your skill with this very important statistical research tool.

5 thoughts on “Chi Square Test of Independence with MS Excel

  1. While practising, one must know that there are two broad classifications in aptitude tests. On the one hand, there are the speed tests where relatively straightforward questions need to be answered within a short time span.

  2. While practising, one must know that there are two broad classifications in aptitude tests. On the one hand, there are the speed tests where relatively straightforward questions need to be answered within a short time span.

  3. While practising, one must know that there are two broad classifications in aptitude tests. On the one hand, there are the speed tests where relatively straightforward questions need to be answered within a short time span.

  4. While practising, one must know that there are two broad classifications in aptitude tests. On the one hand, there are the speed tests where relatively straightforward questions need to be answered within a short time span.

[gp-comments width="770" linklove="off" ]