How to Monitor Website Traffic using Statistical Process Control Charts
In this article, you will learn how to construct statistical process control charts. These charts serve as a tool to monitor your website traffic's stability. The objective is accurate problem detection without the risk of facing "out of control" situations common among web masters, SEOs and online business owners. This article is the first part of a two-part series.
Contributed by Codex-M Rating: / 4 December 14, 2009
The main benefit is to let a webmaster or SEO professional actually spot problematic scenarios which are not included in web analytic software packages (such as Google Analytics and Stat Counter). A control chart lets you systematically detect out-of-control situations and let the website in charge correct the problem as early as it happens.
This is a two-part tutorial. The first part discusses the manual computation methods to help you grasp key concepts for constructing control charts. The second part illustrates the shortcut process of constructing the chart and establishes standard reporting methods using MS Excel that are usable in the day-to-day monitoring of website traffic done either by the webmaster, SEO professional or online business owner.
What is an SPC Control Chart?
To easily understand a control chart, let’s compare it to some standard web analytic charts, like this one from Google Analytics:
As you can see, all it shows is the basic plot of daily website traffic. It does not tell you whether the traffic is stable or if your web site is facing an out-of-control situation. This is where SPC control charts can be very helpful. With problems immediately and accurately detected, traffic can be restored to its normal (stable) level, thus NOT affecting the sales and profitability of the website.
Below is a sample control chart using an Upper Control Limit (UCL), Lower Control Limit (LCL) and the average line:
The pink line is the UCL, the green line is the average line and below that is the LCL (violet line). The blue line depicts the actual daily fluctuations of website traffic. Using the rules for detecting out-of-control points in the control chart, any webmaster or SEO professional can accurately detect the presence of a real problem.
As you can see, every control chart should have the actual line chart of the website traffic, the UCL line, LCL line and the center (average) line. How to construct this chart is the subject and objective of this tutorial.
To understand the principles of computation, let’s start with the manual and basic method before attempting to automate (shorten) the process.
Step 1: Log- n to your Google Analytics account: http://www.google.com/analytics
Step 2: In the Dashboard, click “View Report” for the site profile for which you need to construct a control chart.
Step 3: By default, Google Analytics show the current one month period of data. So if you are logging in to your GA account on December 7, 2009, the reporting period shown will be from November 7, 2009 to December 7, 2009.
In this step you need to compute the control limits (UCL, LCL and the average line). These are computed using the previous one month of historical data. Since we cannot do the computation right in the GA dashboard, you need to export the historical data in CSV Excel format.
For the mean time, change the GA date period to one month of previous historical data. Since the current one month period is November 7, 2009 to December 7, 2009; one month of previous historical data covers October 7, 2009 to November 7, 2009.
Step 4: Click “Visitors” shown in the left pane under Dashboard.
Step 5: Under “Export,” click CSV for Excel; please see the screen shot below:
Encircled in red is the CSV for Excel export link. Within the red rectangle is the one month historical period (October 7, 2009 to November 7, 2009). After clicking the link, a pop-up/dialog box appears; depending on your browser, you need to save the file to a convenient location in your computer.
Step 6: Open the file; you should see the visitors’ data table (the “day” and “visitors” column). Save it as “SPC_Control_Chart(manual).xls” so that it will be in Excel format.
Step 7: Delete the unneeded sections in the data table, except the “Day and Visitors Table.” If you are confused, download this file SPC_Control_Chart (manual).xls to see which rows of data will be left (see the download link at the end of this article).
Step 8: Compute the “Moving Range Average” values of each piece of historical data. The formula is:
Moving range = absolute value of (Traffic on day(x+1) – Traffic X).
“(X+1)” is just a notation that says one day after day “X.” The absolute value means even if the difference of the two values is negative, the final value should be positive.
So if the number of visitors on October 7, 2009 is 43 and on October 8, 2009 it is 55, the moving range computed for October 8, 2009 will be 55-43 = 12. In this example, October 7 is the X and X+1 (one day after) is October 8. There is no moving range value for October 7 because it is the first day of historical data collection. The moving range data starts on the second day.
Further example computation is shown below:
October 7, 2009 -> 43
October 8, 2009 -> 55 (Moving range 1= 55-43 = 12)
October 9, 2009 -> 38 (Moving range 2= 38- 55 = -17 = 17)
October 10, 2009 -> 40(Moving range 3= 40-38 = 2)
October 11, 2009 -> 42(Moving range 4= 42-40 = 2)
October 12, 2009 -> 43(Moving range 5= 43-42 = 1)
The moving range should be computed based on a sequential date arrangement (e.g. October 7, October 8, October 9 …). The moving range is a measure of website traffic variation.
In the Excel spreadsheet, continue the computation until the last data which is November 7, 2009 (refer to download link of this demo Excel workbook showing this example).
Step 9: Compute the “Average Daily Web Traffic” based on historical data. This is the average of the whole visitors’ data sample. In Excel this will be:
average daily web traffic =average(43,55,38,40,42,43…)
Step 10: Compute the average of the moving range values computed in Step 8.
Step 11: Compute the Upper Control limit and Lower Control Limit:
UCL = Average Daily Web traffic + 2.66* average moving range
LCL = Average Daily Web traffic – 2.66 * average moving range
(To learn the reason for using “2.66” as a constant, go here: http://elsmar.com/Forums/showthread.php?s=a5db5769e5063804393c9d0780505e37&t=18735&page=2 )
Since you are monitoring traffic data in terms of visitors, you will not be using decimal values for those limits; instead, you will round it off. Name this analysis worksheet “computation of control limits.”
Step 12: Now we need to implement the control limits on the current time period (November 7 to December 7, 2009). You need to log back into your Google Analytics account, set the time period to that date and export the CSV to MS Excel.
Step 13: Open the exported CSV file with the current period and remove any unnecessary rows except the “day” and “visitors” column.
Step 14: Move that sheet to the SPC_Control_Chart(manual).xls. Name this worksheet “control chart.” After moving this sheet, there are now two worksheets in this workbook: “computation of limits” and “control chart.”
In the “Control Chart” worksheet, add three columns next to “Visitors” and name it “UCL,” “Average” and “LCL” respectively.
Copy and paste the values of UCL, LCL and Average determined from Step 9 and Step 11 to all cells under their column starting from November 7 to December 7, 2009. (Please refer to the SPC_Control_Chart(manual).xls download link below).
So for the next day, data can be updated (for example, on December 8, 2009) in the Excel workbook without re-computing the limits. This is done by adding a new row with the data and the control limits as well as updating the control charts. However, since this method is manual, you still need to log in to Google Analytics and fetch some data (but there is no need to export CSV since we are interested in checking the traffic for a single day only -- December 8, 2009, for example).
In part two, we will be implementing a solution to shorten and systematize this whole process. The improved method discussed in part two can easily be used in any website to implement statistical process control charts.
You can download the Excel workbook “SPC_Control_chart(manual).xls” here: