Failure Mode and Effects Analysis

Failure Mode and Effects Analysis
May 2008

In This Issue:

When we are working with, creating, or just looking at a process, it would be helpful to ask what can go wrong. Failure mode and effects analysis, or FMEA for short, is a way to prioritize potential trouble spots. There are two kinds of FMEAs. Type #1 is a design FMEA; this FMEA is used to anticipate potential failure and to create potential risk priority numbers. Type #2 is a process FMEA; this FMEA is used to monitor failure as it occurs and to record the actual reduction of risk priority results. This month we take a look at the failure mode and effects analysis process.

 

Introduction

FMEAs are an analytical way of identifying how a process can fail and the consequences of that failure. FMEAs are best used prior to implementing a new process, or prior to modifying an existing process. This type of analysis is an organized way of discovering what could go wrong, and planning what we can do to eliminate such potential problems.

FMEAs evaluate three key dimensions of process failure; severity, occurrence frequency and detectability on a scale of 1 to 10 (1 lowest, 10 highest). Severity is a measure of how damaging a failure would be. Occurrence frequency tells us how often this type of failure might occur. A problem that occurs several times a day rates high on the occurrence scale while something that happens once every few years rates very low on the occurrence scale.

Detectability is an estimate of how likely it is that you will detect the failure prior to its having a harmful effect. A failure mode that would be obvious for a long time prior to any harmful effects happening rates a low detectability score. Failure modes that are essentially undetectable prior to the effects taking place rate high on this scale.

 

Failure Mode and Effects Analysis Form

A blank FMEA form is shown below. This form is too small to do much with. You can download a free FMEA template from our website by clicking this link: FMEA Download. This form is also included in the SPC for Excel software.

FMEA Picture

The steps in constructing a FMEA are given below.

 

Steps in Making a FMEA

Step 1: What Is the Process?

A Failure Mode and Effects Analysis (FMEA) begins by specifying the process to be studied. You should have a process flow diagram (PFD) of the process. Record what the process is in the first column on the top half of the FMEA sheet.

Step 2: What is the Potential Failure Mode?

As a group, brainstorm all possible ways that this process could fail. These are the “potential failure modes.” Write these ideas on a flip chart in a fishbone format. Look at the process using 4Ms, P and E – Methods, Materials, Measurement, Machinery, People, and Environment – in the “fishbone” format. From the brainstormed list, choose the most significant failure mode and write it in Column 2 on the FMEA form. You can repeat this process for different failure modes.

Step 3: What are the Potential Effects of the Failure Mode?

Write down the effects that would happen if the potential failure were a real failure. How would the failure impact your company? Your suppliers? Your customers? What would be the worst possible outcomes?

Step 4: What is the Severity of the Failure Mode to the Customer, Product, or Service?

The Severity Rate (S) is the “best guess” of how serious it would be to the customers, the product, or the service if the failure really occurred. A rating of 1 would mean the effect of the failure is considered minor; a rating of 10 would indicate that the effect of the failure would be very severe. Using the information in Step 2, your “best guess,” and the scale below, assign one overall severity rate to the failure mode identified in Step 2.

Severity Scale:

  • 9 – 10: With potential safety risk or legal problems – potential loss of life or major dissatisfaction
  • 7 – 8: High potential customer dissatisfaction – serious injury or significant mission disruption
  • 5 – 6: Medium potential customer dissatisfaction – potential small injury, mission inconvenience, or delay
  • 3 – 4: The customer may notice the potential failure and may be a little dissatisfied – annoyance
  • 1 – 2: The customer will probably not detect the failure – undetectable

Step 5: What Are the Possible Causes of the Failure? Why Does It Happen?

For the failure mode listed in Step 2, write down all factors the team can think of that could cause the failure to occur. Number the causes. Be creative in trying to determine why the failure would occur. Talk to your internal and external suppliers of the process, your internal and external customers, and the “natural team” involved. A fishbone is, again, a helpful SPC tool to use with this step. Failure modes always have more than one cause.

Step 6: How Often Does the Cause of Each Failure Occur?

The Occurrence Rate (O) is an estimate of how often the failure happens due to each specific cause listed in Step 5. A rating of 1 or 2 indicates the failure very rarely happens; a rating of 10 indicates it happens very frequently. (For example, a specific stock item may require a “buy out” virtually every week; thus, the occurrence rate for this cause of a failure mode could be a 10.) For each of the numbered causes in Step 5, assign an occurrence rate from the scale below.

Occurrence Scale:

  • 9 – 10: Very high probability of occurrence
  • 7 – 8: High probability of occurrence
  • 5 – 6: Moderate probability of occurrence
  • 3 – 4: Low probability of occurrence
  • 1 – 2: Remote probability of occurrence

Step 7: How Do We Currently Prevent Each Listed Cause of Failure from Happening?

For each of the failure mode causes in Step 5, write down current ways you prevent the situation from occurring. Number each of these current ways to correspond with the cause number in Step 5. If no current way exists, state “none.” Each potential “cause” should have at least one current prevention method or a response of “none.” Take time to identify current ways of prevention. Decide if you need a control chart to monitor how the current way of preventing the failure is working.

Step 8: How Easy Is It to Detect the Failure Before the Customer Sees It?

The Detection Rate (D) is an estimate of how difficult it is to detect the failure before the customer sees it. A rating of 1 would indicate that it is obvious right away to anyone that the failure is occurring; a rating of 10 would indicate that the failure will go undetected until the effect is felt by the customer. (For example, over-billing a customer’s account might not be detected until the customer gets it. The detection rate for this failure mode would be a 10.) For each of the current ways (Step 7) being used to prevent the cause of a failure mode (Step 5), assign a Detection Rate (D) using the scale below.

Detection Scale:

  • 9 – 10: Zero probability of detecting the potential failure cause
  • 7 – 8: Close to zero probability of detecting potential failure cause
  • 5 – 6: Not likely to detect potential failure cause
  • 3 – 4: Good chance of detecting potential failure cause
  • 1 – 2: Almost certain to identify potential failure cause

Step 9: Scoring Summary

Enter the Severity rate from Step 4 for each cause from Step 5. There is only one severity rating, so it will be the same number for each cause. Enter the Occurrence rate from Step 6 for each cause from Step 5. Enter the Detection rate from Step 8 for each prevention method listed in Step 7. Multiply the “S” times the “O” times the “D” for each cause to get an initial Risk Priority Number (RPN).

S x O x D = RPN

The highest RPN possible is 1000. Such a high RPN would indicate a mode of failure that is very severe (S = 10), occurs frequently (O = 10), and is almost impossible to detect with the current systems (D = 10). The lowest possible RPN is 1; such a mode of failure would not have a severe impact (S =1), would occur very infrequently (O = 1), and would be very easy to detect (D =1).

Step 10: Review if Management Involvement Is Needed.

At this point, before time is spent determining how to revise the process, teams can review their Failure Mode and Effects Analysis with management if necessary. It is probably good to do this if it appears that great costs will be incurred should a major change in process be made. If management is not in agreement with the team’s calculations, the two groups should discuss the rating system until they reach a consensus. Then, the team can return to Step 4 and revise its calculations.

If management is in agreement with the team’s calculations, then management and the team need to decide whether or not the Risk Priority Numbers are high enough to warrant doing something about the process right away.

Step 11: Action Steps: What Needs to be Done?

For each cause with a high RPN, brainstorm ways that the RPN can be reduced. Remember, RPNs can only be reduced by revising the process. Normally, the Occurrence Rate (O) must be lowered or the ability to detect the failure (D) must be increased. The Severity of the failure (S) usually cannot be reduced.

To lower the RPN, brainstorm ideas that will address the occurrence rate or the detection rate. Discuss these ideas as a team and reach a team consensus for best ideas for process improvement or process revisions. The team will find that some recommendations are general for all potential causes of a failure mode; other recommendations should be assigned to a specific potential cause of failure.

Take the best ideas for improvement and recommend action steps. State what needs to be done, who needs to be responsible for doing it or seeing that it gets done, and when it should be started and finished.

Step 12: Implement Action Steps.

Once agreement is reached on the specific recommended action steps, the Action Plan should be implemented. This involves reviewing the steps in Step 11, including who is responsible for each step, and when each step will be completed. Input from management may be required.

Step 13: Recalculate and Adjust New RPNs After Strategy Intervention

Once the action steps have been implemented and new control data are generated that accurately reflect the impact of implemented changes, the team should recalculate the Risk Priority Numbers for each process by determining Severity Rate (S), Occurrence Rate (O) and Detection Rate (D) for the revised process for each cause listed in Step 5. RPN is then calculated (see Step 9). Projected RPNs in Step 12 should be adjusted accordingly. Results should be shared.

 

Conclusions

The failure mode and effects analysis tool provides a method of looking at potential ways a process can fail and then implementing plans to prevent the most likely causes of failure. A few general comments are given below on FMEAs.

  • High severity, which often indicates a potential catastrophe, can capture total attention and too much time when O&D are rated as a “1.”
  • Occurrence is not usually subjective; it can be measured in terms of frequency.
  • Detection is also typically data-based in terms of how often that failure cause has been detected before the customer reports it.
  • People can spend a lot of time on a high O with low D&S. It is easy to see such a failure, and it can be tempting to spend too much time on a problem that is not important (low S) and easy to detect (low D).
  • When the problem occurs frequently, but is not severe, and the detection is easy, we are usually dealing with common cause variation. Management needs to consider making systemic modifications and/or changes.
  • Look for root causes of high RPNs before deciding on recommended action steps.
  • Review the draft FMEA with those closest to the job before deciding on recommended action steps; they may have additional input.
  • Use FMEAs prior to implementing a new process (Design FMEA) and prior to modifying an existing process (Process FMEA).

FMEAs are an analytical way of identifying how a process can fail and the consequences of that failure. FMEAs are best used prior to implementing a new process, or prior to modifying an existing process. This type of analysis is an organized way of discovering what could go wrong, and planning what we can do to eliminate such potential problems.

Summary

The failure mode and effects analysis was introduced.   FMEAs are an analytical way of identifying how a process can fail and the consequences of that failure.  FMEAs evaluate three key dimensions of process failure; severity, occurrence frequency and detectability.  A risk priority number is calculated to help determine which potential causes of a failure need to be address.

Quick Links

Thanks so much for reading our SPC Knowledge Base. We hope you find it informative and useful. Happy charting and may the data always support your position.

Sincerely,

Dr. Bill McNeese
BPI Consulting, LLC

View Bill McNeese

Connect with Us

guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Scroll to Top