Categorical Predictors in SPC for Excel
When categorical predictors are used in SPC for Excel regression routines, a dummy coding system is used to make dummy variables out the categorical predictors.   The dummy coding system used is 1, 0. An example of this is given below.
Suppose you have two factors you think might impact a response. Factor 1 is a categorical predictor. Factor 2 is a numerical predictor. The data are shown below in Table 1.
Table 1: Regression Data with Categorical Predictors
| Factor 1 | Factor 2 | Response | 
|---|---|---|
| A | 17 | 62 | 
| A | 18 | 68 | 
| A | 13 | 66 | 
| B | 17 | 60 | 
| B | 15 | 66 | 
| B | 15 | 57 | 
| C | 19 | 62 | 
| C | 20 | 54 | 
| C | 19 | 59 | 
Factor 1 has three possible levels:  A, B and C. With the 1,0 dummy coding system, the coefficients represent the distance between the factor levels and the reference level.   The reference level is the first level when the levels for the categorical factor are placed in alphabetical order. For this example, the levels under Factor 1 in alphabetic order are:
- A
- B
- C
A is the reference level in this example. In creating the dummy variables, there is no column created for A. There is one column for B and on column  for C. The settings are shown in Table 2.
 Table 2: Level Settings for Categorical Factor 1
| If the Factor 1 level is: | B | C | 
|---|---|---|
| A | 0 | 0 | 
| B | 1 | 0 | 
| C | 0 | 1 | 
The table with the dummy variables added is shown below.  The first column in Table 1 is converted to the two columns: Factor 1_B and Factor 1_C.  SPC for Excel uses the nomenclature of listing the categorical factor name first followed by the level that the column responds to. The first column is for level B in Factor 1.
Table 3: Dummy Variables for Categorial Predictors
| Factor 1_B | Factor 1_C | Factor 2 | Response | 
|---|---|---|---|
| 0 | 0 | 17 | 62 | 
| 0 | 0 | 18 | 68 | 
| 0 | 0 | 13 | 66 | 
| 1 | 0 | 17 | 60 | 
| 1 | 0 | 15 | 66 | 
| 1 | 0 | 15 | 57 | 
| 0 | 1 | 19 | 62 | 
| 0 | 1 | 20 | 54 | 
| 0 | 1 | 19 | 59 | 
The coefficients determined by the software are used to build the model. The coefficients are given in the table below. Note that there is the intercept then the three predictors in Table 3: Factor 1_B, Factor 1_C and Factor 2.
Table 4: Predictor Table
| Predictor | Coefficient | 
|---|---|
| Intercept | 71.18 | 
| Factor 1_B | -4.455 | 
| Factor 1_C | -5.782 | 
| Factor 2 | -0.365 | 
Not there is nothing for level A in Factor 1. This is because the coefficient for Factor 1 when Level A is used is 0.Â
A model exists for each level of a categorial predictor. In this example, the model with the various levels of Factor 1 used are:
Level A (coefficient is zero):
Response = 71.18 Â + -0.365(Factor 2)
Level B:
Response = 71.18 + -4.455(Factor 1_B) + -0.365(Factor 2)
Response = 71.18 + -4.455(1) + -0.365(Factor 2)
Response = 66.725 – 0.365(Factor 2)
Level C:
Response = 71.18 + -7.582(Factor 1_C) + -0.365(Factor 2)
Response = 71.18 + 5.782( (1) + -0.365(Factor 2)
Response = 65.97 – 0.365(Factor 2)
