HomeBrainDump Using MS Excel for One-way Analysis of Var...
Using MS Excel for One-way Analysis of Variance
In the first part of this tutorial you learned both basic and advanced techniques for doing statistical analysis using MS Excel. One of the most important techniques is the comparison of two data sets using MS Excel, which is a common application in engineering, IT and business sector. In this article, we'll continue our study and delve more deeply into doing statistical analysis.
Contributed by Codex-M Rating: / 4 September 28, 2009
You can review the first part of this three-part series at the link. In this article, we will use MS Excel to do advanced comparison of numerical data sets involving two or more groups of data belonging to one factor. In statistical literature, this is commonly known as ANOVA (analysis of variance). Before the emergence of computers to do engineering calculations, doing manual computations using analysis of variance was often a lengthy and difficult process.
Using Excel, there are built-in functions and even Excel add-ons which we can use to make those calculations straightforward and quick. Statistical comparison of data is often ignored in engineering, research, IT and business sectors. This is due to lack of education and formal training to accomplish this type of analysis.
Only by doing "effective data-driven analysis" can you really be sure your project is aimed in the right direction. A lot of money has been wasted because of managerial dependence on "hunches," "educated guesses" and "based on experience" types of judgment.
Since MS Excel is very popular, available on almost every engineer's, scientist's or even manager's laptop, this article will provide some helpful guidance in how to do advanced statistical analysis, such as using ANOVA techniques. It is also important that the "Analysis Toolpak" is installed as an MS Excel add-on. You can confirm whether it is installed via Tools -> Add-ins; make sure that "Analysis Toolpak" is checked. If not, you need to install it before you can follow the rest of the procedures stated in this tutorial. Please refer to this article for the details.
Analysis of variance is the method of comparing whether a group of two or more data groups comes from the same population (with a statistically similar mean or average) by analyzing the variance of those data groups. In short, it is using "variance analysis" to "compare means."
In the earlier tutorial (link provided on the previous page), a T-test was used to compare data, but this can only be effectively used when comparing two sets of data belonging to a single factor. If you are comparing more than two groups of data for a single factor, a T-test is inefficient.
ANOVA is the foundation of more complicated statistical improvement tools such as DOE (Design of Experiment). ANOVA proves highly useful when testing means coming from an experimental study. This is an important test to determine whether samples that come from different groups are either statistical the same or different. ANOVA comes in two types:
One way ANOVA involves comparison of data groups the belong to one factor. This topic is illustrated in this article.
Two way ANOVA involves comparison of data groups with each data group belonging to a different factor. Since this is a more complicated analysis than "One way ANOVA," a separate tutorial will cover implementing this analysis in MS Excel.
Factor identification is important to determine whether you will implement one way or two way ANOVA.
Let's have an example of one way analysis of variance. Say you are an engineer in charge of five machines manufacturing a specific type of resistor with a standard of 100 ohms. To maintain quality, you are conducting a study to see whether each of those machines are capable of producing its standard specifications with all other factors held constant (same machine model, same machine settings, same machine input parameters like pressure, voltage, etc). If there is a substantial difference in the quality of output of those machines, a machine replacement will be recommended.
Of course, hunches and rough analysis are definitely not recommended for this type of improvement project, as it involves buying another machine as a replacement. Carelessness and shallow analysis can lead to the wrong judgment on the part of the management, which leads to wasting company resources.
An experimental study following the classic scientific method implementing "one way analysis of variance" is one of the best methods for coming up with the right decision for a project like this. Okay, once the problem has been clearly defined, you will then formulate a sampling plan for your experiment. A typical sampling plan could look like this:
There is only one factor involved (the machines) and one experimental response to be measured (resistance output of those resistor products), for which the groups are the "machines." Based on the above sampling plan, the engineer needs to gather 20 random pieces of resistors (10 pieces from the day shift and 10 pieces from the night shift). After the samples have been gathered, measurement will then be initiated. A example of measurement data laid out in MS Excel is shown below:
This table is ready for "One way ANOVA analysis" using Excel. You will need to formulate your hypothesis:
Null Hypothesis: The means of resistance measurements from all machines are statistically the same.
Alternative Hypothesis: The means of resistance measurements from all machines are NOT the same.
Rejection Criteria: If P value < 0.05, reject null hypothesis.
The value of 0.05 is the alpha, which is based on a percentage confidence level = 1- alpha. Industry standard confidence level is 95 percent, meaning you risk a five percent chance of accepting the null hypothesis when in reality it is false.
Deciding on an acceptable number for this value calls for consultation with the management; if the project involves a higher cost (involving millions of dollars for investment), then a higher percentage confidence level will be desired, such as a 99.5% confidence level (an alpha of 100%-99.5%= 0.005).
If you need to increase the percentage of confidence level of the study, you'll have to increase the sample sizes to be declared in the sampling plan. For example, instead of gathering 20 samples, this can be increased to 50 samples per machine for greater experimental accuracy.
The next thing you will do is go to Tools -> Data Analysis -> ANOVA: Single Factor and fill in the data as shown in the screen shot below:
Output range is also important; these are the cells where you would like Excel to provide the analysis results. In the above screen shot I am planning to place the results besides the Excel table.
Note that in the Input range, I have included the Column Headers (e.g. "Machine1 to Machine 5"), so I should also check "Labels in first row." If you do not include the column name headers, do not check this item.
You can even decide to get the results in a new worksheet if you click "New Worksheet Ply:" in the output options instead of providing "Output range."
You can then see the results:
Based on the results, the P value is 0.775931, so we cannot reject the null hypothesis; therefore, we can conclude that the mean of the resistance measurements from Machine1 to Machine5 is statistically "similar" or the same. In short, the quality is consistent from every machine, and there is no need for machine replacement.