All Factor Analysis Worksheet Help

The various parts of this worksheet are described below.

Design Table

The design table is at the top of the All Factors Analysis worksheet. This classic design table shows the standard runs and the results. The design table for the etch rate example is shown below. The significant effects are in bold in the row labeled ‘Effects.’

The standard run number is given in Column A, followed by columns for the mean and each factor and interaction in the design. The last two columns are the average and range. The average is the average result for the runs at each standard condition. For example, standard run 1 was run twice during the design (actual runs 9 and 14). The two results were 550 and 604. So, the average is (550+604)/2 = 557. The range is the maximum – minimum for the standard run. For standard run 1, the range is 604 – 550 = 54.

The information below the design table starting with the row labeled “Sum +” is the table analysis using Yates’s algorithm. This will be demonstrated using Factor A results. The rows perform the following functions:

The last row in this section is labeled MSE. This stands for Minimum Significant Effect. It is one way of determining which effects are significant. The equation for MSE is:

MSE = tσ(SQRT(4/N)

where t is the t value for 95% confidence with the degrees of freedom = number of factorial observations – number of cells,σ is the estimated standard deviation obtained from the range values, and N is the total number of factorial runs. In this example, MSE = 47.7028. If there is only one replication, a different method is used to determine which factors are significant.

This value of MSE is compared to the effects in the design table. Any absolute value of an effect that is greater than MSE is considered significant. The program changes these effects to bold (in the effects row). As can be seen from the design table above, A, C and AC are significant effects.

Range Chart

When multiple replications are run, the program uses the range values to estimate the variability in the process and to check for out of control situations. The output for the range control chart on the All Factors Analysis Worksheet is shown below as well as the calculations.

The average range is calculated along with the upper control limit (UCLr) and the lower control limit (LCLr). The range values are compared to the control limits. If none of the range values are beyond these limits, the ranges are in statistical control. If any range is beyond these limits, there is evidence of a special cause of variation that may make the results suspect.

The average range and the control limits are calculated using the following equations:

R = ΣR_i/k

UCLr = D₄R

LCLr = D₃R

where R_i is the range of standard run i, k is the number of range values, and D₄ and D₃ are control chart constants that depend on the number of replications (the subgroup size).

There is no lower control limit on a range for 2 replications. The values for D₄ and D₃ for various subgroup sizes are available in many publications and on our website. Since no range value is above 158.8579, we conclude that the ranges are in statistical control and there were no special causes present when the experimental design was run. The residuals analysis will also check to determine if there were any issues present when the design was run.

Note that the average range is used to estimate the value of the standard deviation used in the MSE equation above (where d₂ is a control chart constant that depends on the number of replications):

σ= R/d₂

ANOVA Table for Factors and Interactions

The next portion of the All Analysis Factor worksheet is the ANOVA table for the factors and interactions. The output for this example is shown below. The significant effects are those with a p-value ≤ 0.05. A p-value is in red if it is less than 0.05. If the p-value is between 0.05 and 0.20, the p-value is in blue. This effect may or may not be significant. It is border-line and probably should be considered for inclusion in the model. In the example, A, C and AC are significant. This agrees with the result from the design table.

The columns in the ANOVA table are:

Source: the source of variation which includes the factors and interactions in the model as well as the error and the total

SS: sum of squares for each source of variation

The sum of squares for the factors and interactions are given in the design table. The model sum of squares (SS_Model) is the sum of the factors’ and interactions’ sum of squares. The total sum of square (SS_Total) is given by the equation below where y_i represents an experimental result and N is the total number of experimental runs:

SS_Total = Σy_i² + (Σy_i)²/N

The error sum of squares is determined by subtracting the factor and interaction sum of squares from the total sum of squares:

SS_Error = SS_Total – SS_Model

df: degrees of freedom

The degrees of freedom for each factor and interaction is 1 since these are two level designs. The model degrees of freedom (df_Model) is equal to the number of factors and interactions in the model. The total degrees of freedom (df_Total) is equal to the total number of runs minus 1:

df_Total= N – 1

The error degrees of freed (df_Error) is the given by:

df_Error = df_Total – df_Model

MS: mean square

The mean square for a source is the variance associated with that source and is determined by dividing the source sum of squares by the degrees of freedom for that source:

MS = SS/df

F: value from the F distribution

The F Value for a source of variation is used to compare the variance associated with that source with the error variance. F = MS/MSE where MS is the mean square for a source and MSE the mean square error.

p-Value: the probability value that is associated with the F Value for a source of variation

It represents the probability of getting a given F Value if the source does not have an effect on the response. If the p-value is <=0.05, it is considered to have a significant effect on the response. A p-value above 0.20 is not considered to have an effect on the response. If the p-value is between 0.05 and 0.20, it or may not have a significant effect.

% Cont: the % of the total sum of squares the source of variation accounts for

Smaller p-values will generate larger % contribution. % Cont = SS/SS_Total where SS is the sum of squares for a given source of variation.

ANOVA Table for the Model

The model’s ANOVA table is listed next on the All Factors Analysis worksheet. The output for this example is shown below.

The columns in the ANOVA table have been explained above. There are several pieces of information below the ANOVA table.

Average: the average of the experimental runs

Standard deviation: the square root of the mean square error

Coefficient of variation: the error expressed as a % of the average, 100(Average/Standard Deviation)

R Squared: measures the proportion of the total variability measured explained by the model

Adjusted R Squared: the value of R squared adjusted for the size of the model (the number of factors in the model)

PRESS: Predicted Error Sum of Squares is a measure of how well the model will predict new values and is given below where e_i is the ith residual and h_ii is the diagonal element of the hat matrix (H=X(X’X)^-1X’)

R Squared Prediction: indication of the predictive capability of the model; the percent of the variability the model would be expected to explain with new data

Factor Information

Information on the factors is listed next on the All Factors Analysis worksheet. This information includes the factor, degrees of freedom, the standard error and 95% confidence limits. The output for this example is shown to the below.

Factor: the intercept, factors and interactions included in the model

Coeff: the regression coefficients for the factors

Degrees of Freedom: the degrees of freedom associated with the factor (always 1 for two level designs)

Standard Error: the estimated variance of the factor which is defined as the following for n = number of replications and k = the number of factors:

95% Upper and Lower Confidence Limits: The upper and lower 95% confidence limit around the coefficient; if it contains zero, the factor is usually not significant

Confidence Limit = b +/- t_{(0.05, dferror)}*se

Model Containing All Factors

The last part of the All Factors Analysis worksheet contains the model for all the factors based on the coefficients given above. The coded model is based on the coded levels (-1 to 1) for the factors and interactions. The uncoded model is based on the actual examples. The output for the example is shown below.