Skip to main content

The title already explains where this discussion is about: what is the real difference between performing FMEA and FMECA?

In the plant where I am working, the maintenance concept has to be reviewed. Until now lots of tasks have been generated by experience etc, but my purpose is to design an approach to structure the development of the maintenance concept.

I am currently researching the possibility of implementing an FMEA methodology. After hours of research on this topic, I still am not sure about the real difference between FMEA and FMECA. I hope someone can tell me more about his/her view on this topic or the knowledge you have about this.

Until now, google has been given me several views on FMEA and FMECA:
  • Most common statement: "FMECA is FMEA with a criticality analysis added. Right...
  • "FMEA (Failure Mode and Effects Analysis) is a process used for analyzing the failure modes of a product. This information is then used to determine the impact each failure would have on the product, thereby leading to an improved product design. The analysis can go a step further by assigning a severity level to each of the failure modes in which case it would be called a FMECA (Failure Mode, Effects and Criticality Analysis)." source (As I read it: if you add a column called "severity level" to a standard FMEA sheet...tadaaah! FMECA approach was born.)
  • In this pptthere is mentioned that the criticality analysis can be quantitative or qualitative. As I have no accurate failure data, the only option is an qualitative approach. This means that I have to rate every failure mode with A till E and that's it? Impressive criticality 'analysis'!
  • Most websites show FMEA sheets, with an RPN approach added to FMEA. This does not make a FMECA approach, right? RPN means rating a failure mode by Severity, Occurrence and Detection.

    All this conflicting information does not really make it easy for me. I hope I made my problem clear and that someone can help me with this problem.
  • Replies sorted oldest to newest

    parle,
    The criticality part of FMECA consists of assigning a number from 1-10 each for Probability and Consequence and a third factor relating to Confidence. Multiplying these 3 gives a criticality number of 1-1000. Thus each failure mode gets a rank based on these 3 numbers. Subjectivity is reduced by using a team approach.
    Confidence is largely based on our ability to 'catch' each failure mode in time. Thus a hidden failure will have a high number, while an easily detectable failure gets a low number.
    The ranking number, or RPN weights each risk rank (Prob.number x consequence number) by the confidence number.
    All the other steps that precede the RPN analysis is common to both FMEA and FMECA.
    RM
    Vee,

    Thanks for taking the time to explain the topic. But still it is not clear to me.

    You say: criticality number = Probability * consequence * confidence.
    But what is the difference between criticality number and RPN (Risk Priority Number)?
    Like I know RPN, this is the sum of Severity, Occurrence and Detection rating
    If I read your explanation of the three factors that form the criticality number, I cannot find large differences with the RPN, so Probability * consequence * confidence equals Occurrence * Severity * Detection?

    Correct me if I'm wrong..
    RM
    parle,
    The risk level, viz. prob*cons or freq*severity corresponds to criticality.
    By adding a Detection probability factor or confidence factor, we add a third variable.
    We are not using the actual probability, we use a number 1-10 to show probability from negligible to extermely high on a 1-10 scale.
    Similarly we assign 1-10 for negligible to catastrophic on the consequence scale.
    Lets say these form the X and Y axes.
    We now add a Z axis to represent detectability on a 1-10 scale.
    So we take the Risk number on a 1-100 and multiply it by the Detectability 1-10 number. The RPN is thus on a 1-1000 scale.
    So a low risk or criticality item, if it is a hidden failure can have a relatively high RPN. Equally a critical or high risk failure mode may be very easy to detect, and end up with a medium RPN.
    Using RCM, we apply this concept qualitatively, by using a decision diagram for hidden failures which is both more stringent and selective in the type of acceptable tasks. In this respect it is superior to FMECA. But FMECAis useful when man-machine interfaces matter a lot, e.g. in analyzing Communication systems.
    I hope this explanation makes it clearer.
    RM
    Vee,

    Summarizing statements:
  • FMEA can apply an RPNumber, which consists of a probability, consequence and detection scale (all 0-10). This does not forms an FMECA.

  • The RPNumber can be applied to rate the status quo of a failure mode(what is the score when no maintenance is performed on this part) and to predict what the influence of the proposed (maintenance)action will be by rating the three aspects again.

  • FMECA is a method using a criticality rating formed by probability * consequence (both scale 0-10) to rate the 'criticality' of a failure mode.

  • Clearly, the criticality rating provides a more precise view on the criticality of a failure mode, because only the two most important factors are used for the failure mode rating.


    This means that if I only use the factors probability and consequence to make the 'criticality rating', I have an FMECA. And if I use an extra factor to rate a failure mode, named detection, I have an FMEA?

    Is this the only large difference?
    And which reasons do you know to use one of the two methods?
  • RM
    parle,
    I think there is still some clarity required, so I will comment on each of your statements:
    quote:
    FMEA can apply an RPNumber, which consists of a probability, consequence and detection scale (all 0-10). This does not forms an FMECA.
    Each scale is 1-10, not 0-10. In an FMEA, we have to identify the failure mode AND its Consequence. If we are doing RCM, we add our estimate of probability. That gives us the risk of failure. An FMEA does NOT neeed an RPN.

    quote:
    The RPNumber can be applied to rate the status quo of a failure mode(what is the score when no maintenance is performed on this part) and to predict what the influence of the proposed (maintenance)action will be by rating the three aspects again.
    The RPN is an estimate of the risk posed by each failure mode AND how easy it is recognize its existence. That determines what maintenance action is required. It says nothing about the status and is not a measure of performance. The RPN will remain what it is irrespective of whether we act on the required maintenance task or not, as long as our estimate of the 3 factors remains the same.

    quote:
    FMECA is a method using a criticality rating formed by probability * consequence (both scale 0-10) to rate the 'criticality' of a failure mode.
    . No, FMECA = FMEA + RPN, scale is 1-10 for each of 3 elements.

    quote:
    Clearly, the criticality rating provides a more precise view on the criticality of a failure mode, because only the two most important factors are used for the failure mode rating.
    . There is a problem of nomenclature and definition. In FMECA, the RPN is termed "criticality", while elsewhere, we understand Risk, which has only two elements, as "criticality". No doubt this causes confusion.

    quote:
    This means that if I only use the factors probability and consequence to make the 'criticality rating', I have an FMECA. And if I use an extra factor to rate a failure mode, named detection, I have an FMEA?
    No, you need only the failure mode defined and its consquence to have an FMEA. If you add the RPN, you have FMECA.

    quote:
    Is this the only large difference?
    You have reversed the two, but yes, that is the main difference.
    quote:
    And which reasons do you know to use one of the two methods?
    It all depends on your end objective; do you want to improve your design, identify maintenance requirements? A version of FMEA can help the designer, another (Functional) FMEA can help identify maintenance requirements, when using the RCM logic charts, and FMECA can help find maintenance requirements for certain kinds of systems where there are lots of man-machine interfaces.
    RM
    Last edited by Registered Member
    Vee,

    Thanks. This is what I was looking for.

    quote:
    Each scale is 1-10, not 0-10.

    My mistake, I meant that the rating scale is between 1-10, like you said. That's logical, because if you sum three factors with one being zero, the answer will still be zero.

    quote:
    The RPN is an estimate of the risk posed by each failure mode AND how easy it is recognize its existence. That determines what maintenance action is required. It says nothing about the status and is not a measure of performance.

    What I meant to say was that the RPN says something about what will happen if no maintenance is done on the part analysed.
    RPN, like I see it, can be calculated again if one of the three factors decreases by doing maintenance on a part. This is a way to check if the maintenance proposed is effective or not.

    If you said this before, a large part of this discussion should not have been necessary: FMEA + RPN = FMECA

    The purpose of the FME(C)A is to set up a maintanance plan for an allready installed machine.

    By the way: are you the only one with knowledge and experience with FMEA/FMECA on this forum??
    RM
    quote:
    n the plant where I am working, the maintenance concept has to be reviewed. Until now lots of tasks have been generated by experience etc, but my purpose is to design an approach to structure the development of the maintenance concept.

    Parle,
    Lets go back to your original question. If this is your goal, then I don't think either FMEA or FMECA are the answers. You need not use RCM because you have already got a maintenance program. I would suggest you use Planned Maintenance Optimization. This is a process that starts with your current maintenance program (formal or informal) rationalizes and reviews that, then fills in the gaps. Compared to RCM, FMEA or FMECA, it is like starting from the 80 yd line. You don't need to scrap your current program. Make it better.
    you can find out about PM Optimization at www.reliabilityassurance.com or send me an email at steve@omcsinternational.com.
    Also, be careful about PM Optimization programs. Some do not fill the gaps for you - It is important that you dont review your current program only - you must fill the gaps.
    Regards Steve
    RM
    Parle,
    Also on the subject of Criticality, despite intuitive reasoning supporting its importance, the intervals of condition based maintenance are primarily set by the PF interval (the rate of equipment deterioration) and hard time replacement is determined by safe life. Asset criticality is a second order consideration for evident failure modes. It is important for hidden failure modes, but if one choses the right algorithm for maintenance analysis, then one avoids the RPN system which, to me adds an unnecessary complication to the analysis and, can drive the wrong thinking.
    Regards
    Steve
    RM
    Steve,
    You say,
    quote:
    Asset criticality is a second order consideration for evident failure modes

    Risk is what we want to manage. Most Regulators would require us to demonstrate that our Maintenance Program reduces Risk to ALARP levels. I do not think they will be too impressed if our Program focuses on Cost alone, however marketable that concept is to many companies who do realize that it is a slippery road to take.
    Risk evaluation requires us to look at both Consequence and Probability of failures. Risk equates to Criticality, so unless you think risk is unimportant, I find your above comment strange and misleading. The fact that some PMO systems do not consider probability in a systematic way, to save time and money is a lacuna, not a strength. You have argued in other threads that good quality data is hard to find in CMMS, which is a fair observation. But it is certainly possible to find reasonable failure frequency data, given a mindset that does not dispense with probability as too difficult or costly to handle.
    Your argument about P-F curves is spot on. We need good data about P-F intervals too, but I do not know of many companies that plot P-F curves either. The concept is however important and we can get that information from operators and maintainers close to the scene of action.
    The important point about failure rate data is that the outcomes are not as sensitive to their accuracy as some people think. The the relationship is logarithmic; errors of 50% or more can be tolerated, i.e., the maintenance interval will not change a great deal due to such errors. In most cases ball-park estimates are good enough. If you can find the data for
    quote:
    .. hard time replacement is determined by safe life
    you can find other (failure) data just as easily.
    Don't dismiss FMECA, it has its uses, just as RCA, IPF or RCM do; every toolbox has a variety of tools to suit each situation. We need more than a hammer, unless every object we see is a nail.
    RM
    Vee - you have cut half the paragraph and then blast away. Please leave the whole paragraph because it sets the context.
    quote:
    Also on the subject of Criticality, despite intuitive reasoning supporting its importance, the intervals of condition based maintenance are primarily set by the PF interval (the rate of equipment deterioration) and hard time replacement is determined by safe life. Asset criticality is a second order consideration for evident failure modes

    When we determine a task - for evident failure modes there are three options - CBM, Fixed Time Replacement or No Scheduled Maintenance.
    The interval of condition monitoring is set by the PF curve primarily... end of story. If the condition monitoring task is not robust, then more inspection can improve the odds of finding a failure and the formula for this involves the consequence of failure which is why I say it is a second order consideration. The same applies in principle for Hard Time maintenance.
    Rgds
    Steve
    RM
    quote:
    The important point about failure rate data is that the outcomes are not as sensitive to their accuracy as some people think. The the relationship is logarithmic; errors of 50% or more can be tolerated, i.e., the maintenance interval will not change a great deal due to such errors. In most cases ball-park estimates are good enough. If you can find the data fo

    Vee - this is intriguing... can you tell me how many data points you recommend to be confident of your answer?
    rgds
    Steve
    RM
    Steve,
    quote:
    The interval of condition monitoring is set by the PF curve primarily... end of story ... I say it is a second order consideration

    I agree entirely with the first part of the above, but you have ignored my comment. "I have not seen many people producing P-F curves". Have you?. Secondly the P-F interval can vary quite a bit, for a single failure mode, there can be quite some uncertainty about the droop. If we suspect the PdM technique, we may use a second method, not more of the same. Thus if bearing vibration readings are not definitive, we might cross check the oil condition or temperature trends. So I cannot agree with your view that 'more frequent' inspection' of doubtful validity will help. Where the degradation rates, and hence droop of the P-F curve can vary a lot, more inspections will help, but it should not be based on Asset criticality at all. After all the droop is a physical phenomenon; it does not have anything to do with criticality - which is totally unrelated to the physical process of degradation. I disagree with you when you say that criticality has a place in the inspection frequency decision at any order, not just the second.
    quote:
    hard time replacement is determined by safe life
    On the one hand you don't want to believe failure data sources; where do you get the safe life from? To those who believe in statistical analysis - and I know you are not in that group, getting safe life is easy, e.g. L10 of ball bearings. Maybe you can explain your process of determining safe or useful life.
    quote:
    how many data points you recommend to be confident of your answer

    I have just explained that errors of 50% in the actual value of the failure rate(or MTBF) will not make a big difference to the maintenance interval. So the answer must be self evident; 'many' is good, but we can live with a 'few', sometimes just 2 or 3. Now, if you are doing a Weibull analysis, we need a lot more, say 7 or 8, since 2 or 3 may get censored, and about 5 (clean) points are necessary to get 90-95% confidence.
    RM
    Vee,
    First answer.
    We are so far apart on this discussion it is hard to know where to start.
    The PF interval needs to be assessed on an experiential basis. It can be done in fracture mechanics and some other applicatoins but in most cases no one calculates it in reality. The problem with lack of data means we can not use mathematical models (in the majority of cases)- we need to use empirical information. In the practical world we assess orders of magnitude, hours, shifts, days, weeks etc. From this basis we determine inspection intervals that intervene within the PF interval, hourly inspections, inspections each shift, each day etc etc.
    If the inspection is not robust then it should not be used. Mathematically though, if the inspection is 80% effective, then doing it twice increases the chances of detection. This is a mathematical fact but in reality, maintainers are far better of improving the detection method hence there is no value in the formula that increases inspection interval because of poor inspection methods.
    If you are aware of the formula for probability of detection when an inspection has less than 100% success, it factors in cost.... hence the assessment of the consequences comes into being. then and only then is the consequence a consideration in setting the interval of inspection.
    RM
    quote:
    n the one hand you don't want to believe failure data sources; where do you get the safe life from? To those who believe in statistical analysis - and I know you are not in that group, getting safe life is easy, e.g. L10 of ball bearings. Maybe you can explain your process of determining safe or useful life


    Vee
    L10 life only tells you when 10% of the bearings will fail. It does not tell you the failure pattern. It could be random in which case there is no safe life.
    RM
    quote:
    Now, if you are doing a Weibull analysis, we need a lot more, say 7 or 8, since 2 or 3 may get censored, and about 5 (clean) points are necessary to get 90-95% confidence.


    Vee,
    Have you done the maths. You need a lot more than 7 or 8 data points to get 95% confidence interval and confidence depends on the correlation of the points more so than how many there are.
    And in reality, getting 7 or 8 points for analysis is hard to get given the changes in systems and operating conditions of equipment.
    If you are talking 10 or 15 points you may start getting close to 95% confidence but that all depends on what distribution you select and how well the points fit that distribution.
    RM
    Steve,
    quote:
    confidence depends on the correlation of the points more so than how many there

    I am glad we can agree on some things. You are spot on, the confidence depends on the correlation of the points, but it is also necessary to have a set of points to get a Weibull plot. My point is the latter, and IF the correlation is good, you will get the confidence you want. I knew your question was phrased incorrectly, but preferred to be restrained.
    quote:
    L10 life only tells you when 10% of the bearings will fail

    In other words, the survival probability is 90%; the bearing example was simply to illustrate the value of statistical analysis as against 'practical guess-work'. If I know the failure is age-related i.e., patterns A, B or C, then I look for the survival probability to determine the timing of maintenance. This may be by mathematical analysis, or more often, by listening to experienced operators and maintainer at site. This value (surv. prob.) is directly related to the risk that is to be managed. High risk requires high survival probability. If the pattern is not age-related (or you incorrectly call it, random), then time of failure is unknown. In this case I agree with you, we will do condition based maintenance if it is an evident failure.
    quote:
    ...when an inspection has less than 100% success....

    There are two reasons for this inspection failure, namely
    a. The inspection technique, instrument or skill is inadequate. That must be corrected if possible. If not, there is no point in doing more of the same, whatever be the consequence. Your cost-focus is thus misplaced.
    b. The variability of the the droop is high, i.e. the degradation rate is variable. In this case, more inspections will help, but this has nothing to do with cost of the failure.
    In both cases, there is no relationship between cause and effect, so your cost-based approach will not achieve anything, except to create unproductive maintenance work.
    RM
    quote:
    In other words, the survival probability is 90%; the bearing example was simply to illustrate the value of statistical analysis as against 'practical guess-work'. If I know the failure is age-related i.e., patterns A, B or C, then I look for the survival probability to determine the timing of maintenance.

    This my whole point - you don't know the pattern and you are unlikely to get sufficient reliable and representative data to get a confident outcome of what the Weibull parameters are - if and IF with capital letters, the pattern even fits the Weibull plot.
    Correlations in maintenance with less than 10 valid points are about the minimum in my book.... and these points must contain the same valid shape if you are using Weibull. For example, you cant mix infant mortality failures with wear out failures int he one plot. Weibull cant handle two beta parameters.
    RM
    Indeed, this is going way off-topic. I have read some interesting points of view, but none of them has much to do with the topic where this has all started about.

    I'm not trying to focus to much on "I must do FME(C)A and there's nothing else that will fit my needs". Though, I will provide a little more information about my situation.
    The proposed FME(C)A will form a part of a method that will structure the development of a maintenance concept based on criticality. Not only on already placed machines, but also on new machines that will enter the site within short time. The machines are similar in functionality, no matter how old or new they are.
    A fresh look on the development of the maintenance concept is required, because every Maintenance Engineer does what he wants and how he feels about a certain maintenance task.

    I know you want to help me, but if you give advice please make sure you have enough information to judge about my situation and my needs.

    About the original topic:
    I have spoken with some people who are also familiar with FMEA and FMECA and as already was said: the main difference is the presence of a Criticality Analysis.

    Why should I add this, if I already have a RPN? Well, CA is intended to be used to rate a maintenance task by its criticality. The factor 'detection' has not much to do with the criticality of a maintenance task in my opinion.
    RM
    For your new machines, you say you have like equipment already in service. In my view, equipment service history is sort of a living FMEA; or FMEA by experience.

    In the beginning, your maintenance strategy for this equipment was probably adjusted to account for real failures, maintenance caused failures and/or lack of maintenance caused failures.

    What is right and wrong with your current maintenance strategy?

    Perhaps since you have several of the same machine models you would like to formally document actual failures as well as perceived possible failures (which hopefully your current maintenance strategy is preventing). That is what the FMEA is for. If you can get the right people in the room and have a good moderator, perhaps it would prove useful.

    We haven’t had great success with FMEAs, but we were trying to develop them after 10 years operating experience with the plant. It turned out that we already adjusted for many real failures and maintenance practices.

    The real payoff for future savings for us is looking for better maintenance indicators or triggers and eliminating needless maintenance (PM Optimization). Unless equipment is run 24/7 such as utilities, or is very constant in its run time, then time-based maintenance may work fine, although it is just an estimation for equipment usage over time. For changing run rates, especially as the current economic slow-down has caused, usage-based maintenance seems like the best strategy.

    IMHO
    RM
    Hello Parle,
    I am a reliability engineer at a chemical facility in the UK.
    We are using FMECA on existing plant and on newly designed plant.
    The existing plant we carry out a high level criticality analysis and then a full RCM approach on the top 20% (ish) and a Maintenance Task Review on the remaining 80%. We are using a program called Isograph Availability Workbench to capture all our maintenance decisions on the plant items to help optimise the strategies based on minimising cost and safety risk.
    On new design plant we carry out FMECA on all items.
    I have used (indirectly) the PMO approach that Steve Turner continues to promote. It is well documented in this forum and others that Mr Turner does not like RCM or statistical methods for improving maintenance. There are many shortfalls in his claims on PMO. I have used FMEA/FMECA approaches for over ten years now with great success. I have also used it on systems that have already gone through PMO, so I am aware of its shortfalls and the criticisms of others who have used PMO.
    I think the comments and advice received from Vee was useful information regards FMEA/FMECA for all readers of the forum.
    Cheers Gary
    RM
    Parle,

    Back to your original question. You may find that MIL-STD-1629 "Procedures for Performaing a Failure Mode, Effects and Criticality Analysis" gives the answers you are seeking.
    A copy of this standard is available at http://www.barringer1.com/mil.htm
    RM