Hi everyone

I am a reliability engineer. I have a question about the PdM & PM findings. our company using the SAP system and using maintenance request classification such as M1 & M2 . M2 is the request to repair the asset when an unreliability event occur such as trip, failure or asset not performing its intended function and All M2 must be investigated. now my question is if during PM found some problem in the asset that need to be fixed. does this considered unreliability event. I my mind any corrective action with the result of PM or PdM is considered proactive work and should not be considered unreliability event and should not be counted in the calculation of MTBF.

please advise?

Reliability Engineer
Original Post

Big question with lots of opinions I'm sure.  Here is mine.

1. MTBF is designed for non-repairable assets to determine their expected life.  It is for population of assets so you can mathematically derive when best to take intervention to ensure a specific overall reliability across the population.  This calculation can be utilized at the component level but must only be used per single failure mode of single component type.  For example you can utilize MTBF for pump seals.  This will give you some level of expectation of life expectancy give your current practices for installation, medium being pumped, and duty cycle.  All things in this calculation should be same (same component, same operating context, etc).

2. MTTR is designed for repairable assets.  This is the calculation you are looking for and want to utilize if the asset can be restored to base condition.  When using this calculation, and given your example of finding an issue, the issue should be included.  You are seeking to know how often you must visit the asset to take intervention outside of standard PM inspection.  All corrective interventions regardless of if they are found prior to catastrophic failure should be included.

3. Yes, this a result of doing proactive inspections.  However, the functional failure has occurred or you would not need to do the work.  In your work categorization you should have, in SAP PM, a task type which identifies the work as found during a PM or condition monitoring work order.  This will give you the effectiveness of your PM / condition monitoring program.  If you never find issues your PMs are too often.  If you always find issues your PMs too far apart.  Rule of thumb I have read is a 4 or 5:1 ratio, meaning you should find a follow up corrective task for every 4 or 5 PMs executed.

4. As for your requirement for investigation I would say it is your criteria that needs to be looked at.  If the event is recurring perhaps it deserves to be investigated but if it is a wear component you expect to fail over time perhaps not unless it is not reaching its life expectancy.  A blanket statement which says all must be investigated seems a bit overkill. Maybe look at the trigger criteria and make adjustment if needed.

Again, opinions vary as do context of your asset base, the risk you can take, etc.  This is just my input.


Thank you very much George for your detailed answer. I have the same thought and have big arguments with corporate maintenance department that all M2 ( reported) events cannot be investigated . 

Let me explain you the real scenario. Operation issue all problem ( big or small) as unreliability event ( M2) because it’s easy for them to order the material but SAP system automatically take M2 as failure and calculate MTBF . At the same time a request generated for reliability to investigate this unreliability event though 5-whys using FRACAS system. This ended up around 500-600 notifications in 6 months which is impossible to investigate even many of them are not even failure, such as “ small oil leak” “ fix pump packing “ etc. now on the top of that corporate maintenance setup KPI target of 85 % must be investigated to be in compliance.

there are few categories of this reporting notification. 

IA = if this is found during PM

1B = if this found in PdM

1C = if this unplanned failure, drip , or defect

now my point is to investigate only ‘ 1C” and other categories should not be counted as unreliability event . I wrote the procedure  as follows:

if anything happened please issue a minor maintenance ticket just to figure out if this is real failure or small issue which can be fixed easily within this limited budget ticket. If this need investigation based on the judgement then issue M2 with category 1C so that it can be investigated. 

But if something found during PM or PdM this should not consider failure as this would be the part of the PM as you said 1:6 ratio. This should not be M2.

i never  seen a company investigate 500-600 failure in 6 months .

what is thought about this ?

Could not imagine trying to investigate that volume of issues, even if using a simple 5 why.  There should be very detailed criteria to trigger an RCA.  This criteria may change over time as you reduce the number are larger, more significant, events.  I would look at the 500-600 you have collected and see what commonalities exist which can get that to a more manageable number like 50.  Then develop criteria which would have come to the conclusion of only these 50.  Present this to management as a way to both prioritize the investigations as well as get them to a more manageable level until you can loosen the criteria and allow slightly smaller incidents to hit the list.  Over time, if the RCFA process is robust, you should see much less than 50 on the list.  At that time open criteria a bit.  Try to keep list manageable and continue on.  

Add Reply


HACKATHON in association with IMC-2019