All Member Open Forum

 View Only
Expand all | Collapse all

Risk Management for Maintenance Tasks

  • 1.  Risk Management for Maintenance Tasks

    Posted 06-02-2023 12:20 PM

    I have a couple of questions for the group, but will provide some context first.

    In the plant that I am working in, we somewhat regularly suffer from rework/infant mortality shortly after starting up from planned maintenance outages. These stem from defects introduced during intrusive corrective maintenance work. Due to the amount of time we typically get for an outage and our staffing level, corrective maintenance is largely done by 3rd party contractors while our technicians focus on PM work. A "commissioning" form was developed in the past for our technicians to use before equipment is returned to service, but it is generic and not consistently implemented. My questions are:

    • Has anyone had success with a process or tool to identify/manage, before the work is done, the risks to equipment of performing a particular maintenance task?
    • Is anyone using equipment criticality or other criteria to control who (3rd party, internal, OEM, etc) is/is not authorized to work on certain equipment?

    The most recent incident bit us hard, to the tune of 23.5 hrs of rework, but it is part of a pattern of less serious incidents. Any insight is appreciated.



    ------------------------------
    Haydn Scott
    Maintenance & Reliability Engineer
    CertainTeed
    Peachtree City, GA
    ------------------------------


  • 2.  RE: Risk Management for Maintenance Tasks

    Posted 06-05-2023 08:54 AM
    Haydn - 
     
    Two points:
     
    1.  Most equipment fails randomly, after the initial infant period.  So, you should only work on equipment that has been demonstrated to have a defect that creates an impending failure.  Meaning that you should have  a decent condition monitoring program - operators, inspections, and traditional PdM.  
     
    2.  The greatest risk of failure and defect induction is the infant mortality period, something you're clearly demonstrating.  So, you need standards for the work, and a commissioning process that validates the quality of the work.  This appears to be lacking in your contractor. 
     
    We've known this since WWII.  Google the Waddington Effect regarding his finding on B-24 bombers - they did a major maintenance every 50 flight hours.  Following this they would have some 50 repairs the next 10 hours, decreasing by 10 each 10 flight hours until  it dropped to 10.  Then they[d start over - repeating the same pattern.  They were disturbing a relatively stable system.  Though limited by the technology of the time, they moved to a more condition based approach.  You're not limited by the technology.  
     
    Happy to discuss this further if you like. 
     
    Ron Moore 





  • 3.  RE: Risk Management for Maintenance Tasks

    Posted 06-06-2023 11:31 PM

    Thanks Ron for responding.

    On your first point, I may have been unclear in describing the situation - it is not intrusive PM work (I am steadily reducing that) that is introducing defects, but the planned corrective work that is generated from prior inspections, PdM, etc. Work is being done (either repair or replace) to correct an identified defect, but we are ending up in some cases with a "worse new" condition.

    On the second point I completely agree - we have struggled with consistently implementing the commissioning process we currently have. However, we also struggle with having the internal resources to audit every contractor job for quality before returning equipment to service, which I why I was considering a form of risk management. I'm definitely open to suggestions on a better way to plan/execute commissioning.



    ------------------------------
    Haydn Scott
    Maintenance & Reliability Engineer
    CertainTeed
    Peachtree City, GA
    ------------------------------



  • 4.  RE: Risk Management for Maintenance Tasks

    Posted 06-05-2023 04:31 PM
    Edited by Michael Smith 06-05-2023 04:31 PM

    Haydn,

    We have seen numerous occurrences of machine degradation after maintenance efforts have been performed (typically misalignment).  Having historical machine data could be valuable in order to formulate baselines and 'normal' operating data while also keeping track of what equipment is going awry.  I would be happy to have a conversation with you on the topic just to share experience. 
    Feel free to reach out. 



    ------------------------------
    Michael Smith

    michael.smith@sensirion.com
    Key Account Manager
    Sensirion Connected Solutions
    Chicago IL
    ------------------------------



  • 5.  RE: Risk Management for Maintenance Tasks

    Posted 06-06-2023 11:21 AM

    Haydn,

    I have worked with organizations that have restricted work on certain assets to be done by only qualified personnel - whether staff or contract.  If your asset was an F1 race car, you won't be taking it to a local mechanic shop for corrective repairs, no matter how good their reputation. Sounds like this was a costly event.  Clearly your asset, in this story, has high potential or actual value.

    Step 1: asset management plan

    This scenario puts light on an organizational issue that should be addressed by a simple strategic asset management plan.  It doesn't require expensive consultants with endless stacks of post-it notes.  A simple document and consensus is all that's needed.  Also, no software needed other than what's needed to produce the document.

    Step 2:

    Compare your existing process to a generally accepted failure elimination work process to see if you're documenting this reliability event and asking yourself the right questions about the loss itself, the event data, RCA, failure mode review, strategy review, etc.  Again, no software tools needed other than Visio or equivalent.

    Step 3:

    Tool - there are some excellent solutions and libraries available to help execute the strategy (step 1), enable the work process (step 2), and operationalize the strategy that is meant to keep these types of events from happening.

    Hope this is helpful.  Please let me know what you think.

    For everyone's interest, ISO Technical Committee 251 is working on a proposed guidance standard called, "ISO 55012 - People involvement and competence" (which might address Haydn's question at a higher level).



    ------------------------------
    Marc Laplante
    Asset Management Principal
    Roanoke VA
    ------------------------------



  • 6.  RE: Risk Management for Maintenance Tasks

    Posted 06-07-2023 11:31 PM

    Thanks for responding Marc. Your initial analogy is correct, and the steps you've outlined are helpful. I've spent some time thinking on how to low-key pitch a SAMP where there is not organizational support for that (at least not as such). I agree having that would drive other activities, so I'd be interested to hear any thoughts you have on that.

    Regarding the third step you outlined, are there any specific tools that you seen successfully deployed?

    I'd be interested to see the ISO guidance document when it is published - my initial thoughts on my approach were based on ISO 55001 8.3 and what I'd done in the past for ISO 9001 regarding outsourced activities.



    ------------------------------
    Haydn Scott
    Maintenance & Reliability Engineer
    CertainTeed
    Newnan GA
    ------------------------------



  • 7.  RE: Risk Management for Maintenance Tasks

    Posted 06-20-2023 05:41 PM

    Hi Hayden, sorry for leaving this unanswered for so long.  I apologize.  

    It might be useful to have a 30 min discussion over a Teams meeting.  I could run a few things by you to see what makes sense.  I could help with some ideas on how to, as you say, "low key pitch".  The "seven questions" approach could be helpful for just that.

    As far as libraries are concerned, there are a few out there.  In the past I've worked with the APT library (which has been acquired - need to verify who acquired them).  We have one that I could show you so you can decide if it has the right asset types for what you're trying to accomplish.

    55001, 8.3 - spot on.  The "shall" statements in this clause are meant to point out protections that are needed to reduce the risk of the failures you originally described.

    Please let me know what you think.  Let me know if you have some time over the next couple of weeks for a 30 min Teams meeting.



    ------------------------------
    Marc Laplante
    Asset Management Principle
    Committee Member ISO TC251
    Itus Digital
    Charleston, South Carolina

    mlaplante@itusdigital.com
    www.itusdigital.com
    ------------------------------



  • 8.  RE: Risk Management for Maintenance Tasks

    Posted 06-07-2023 09:23 AM

    Hi Haydn,

    Regular rework occurrence after planned corrective maintenance creates the picture that several issues conclude to such a situation which leads to extended equipment/plant unavailability and induces extra cost. Plus, You mention that those corrective actions were identified previously, which means that either improper inspection, planning, maintenance workforce or all of them, can jeopardize the output of performed corrective tasks.  


    In short, rework risk management should include correctly identifying defect/wear, spares, tools, tasks, duration, timing, workforce skills, and workforce quantity, proper timing, ... paying attention to the manufacturer manual and recommendations. 

    Equipment criticality should trigger a review of the skills of people authorized to work on specific equipment. Infant fails to occur to brand new equipment in the early years of operation due to hidden defects on parts, improper installation, and wrong operation, ...

    Definitely, job identification, planning and execution should be reviewed, including the skills/training of those authorized to perform on critical equipment. 

    Let me know if I can help further .

    Regards



    ------------------------------
    Marcel Cocosatu
    ADNOC SOUR GAS
    Abu Dhabi
    ------------------------------



  • 9.  RE: Risk Management for Maintenance Tasks

    Posted 06-09-2023 12:50 AM
    Hi Haydn,
    Some great points have already been provided, a couple of additional thoughts:
    - in my own experience generic commisioning forms are typically not very useful, they often end up giving an impression that things are being done rigorously but if the details are lacking in the form it really doesn't add much. I would strongly recommend getting specific return-to-service / commissioning forms in place (start with the most risky / troublesome equipment)
    - you mention that this work is done during an outage and that the majority of corrective work is done by 3rd party contractors, while your own technicians focus on PM work. I think there is an opportunity here. You typically can't avoid having to bring in a lot of additional external labour for outages, and they are therefore often of unknown 'quality' (unless you can ensure specific people return for each campaign, which is worth pursuing if possible in your environment). However, I would look carefully at a different model where your own technicians take on the role of supervisor for the outage and supervise small crews of external contractors who execute the work. In my experience this helps with improving safety, quality and is a good development opportunity for your technicians.
    - rework as you describe is typically the result of the wrong task being done or the task being done wrong. In the case of these correctives it is likely that it is not the wrong task, but that the work is not being done right. That could be because the external technicians don't have the right competency, or because the work instructions that are being provided are not detailed / clear enough and rely on 'tribal' knowledge that your own technicians may have gained over years but the external contractors would not have.
    - I have in the past worked with a approach were only certain technicians were allowed to work on certain systems because they had the right experience and the right skills, in most cases this was an informal system, and in one case we had this documented through a formal competence management system (but this was only done for our own staff and long term contractors not adhoc contractors). Some CMMS platforms / addons allow you to actually include certain competencies for the crew and would stop you from scheduling work to them if they do not have the right competencies. It can get very admin-heavy though.


    ------------------------------
    Erik Hupje
    http://www.roadtoreliability.com
    https://www.linkedin.com/in/erikhupje/
    ------------------------------



  • 10.  RE: Risk Management for Maintenance Tasks

    Posted 06-23-2023 08:02 AM

    Erik, you bring up some great points on how I have seen this managed in the past. 

    I have seen success in using experienced and  equipment knowledgeable internal technicians to assist in overseeing contract work. This is of course dependent on what is possible with your labor rules and workforce capabilities. This may not be practical on all jobs but for critical equipment this assignment can be very useful.

    Consider requiring training certification for contractors to show proof of competency. This may raise the cost of labor but will most likely save with reduced downtime. I have worked with a company that partnered with the contract labor force to sponsor training to occur to ensure required precision maintenance is being completed correctly. This may not be practical for everyone but was utilized effectively.     



    ------------------------------
    Jesse Day
    Sr. Reliability Engineer
    Wacker Polysilicon
    Charleston TN
    ------------------------------



  • 11.  RE: Risk Management for Maintenance Tasks

    Posted 06-23-2023 09:29 AM

    Our process currently employed is as follows;

    • Company technicians are the task leads for any job being undertaken by the maintenance department. These techs would ultimately supervise /oversee the execution by 3rd parties/ contractors etc. This is for both corrective and preventive maintenance work.
    • Company technicians are responsible for conducting a job safety review analysis J.S.R.A and work procedure. The work procedure would indicate special notes regarding flange gaps dimensions, torque values for flanges etc. while the JSRA would cater for the risks associated with the different steps to execute the task and the mitigations required.
    • Pre Start up Safety Reviews (P.S.S.Rs) with a multi disciplined team , are used to review the work completed and any documentation (welding test results etc.) submitted in support of the work completed prior to start up.


    ------------------------------
    KAMARIA DUNCAN
    Mechanical Technician III
    P.P.G.P.L
    Point Fortin
    ------------------------------



  • 12.  RE: Risk Management for Maintenance Tasks

    Posted 06-25-2023 11:13 PM

    Thanks to everyone who's responded - I greatly appreciate it. Grateful for the opportunity to get insights on this from you all.

    Last week I started facilitating working sessions with the maintenance team to develop a solution. Will definitely be utilizing a lot of this feedback as we work through different options over the next several weeks.



    ------------------------------
    Haydn Scott
    Maintenance & Reliability Engineer
    CertainTeed
    Newnan GA
    ------------------------------