3 Ways Data Makes Life Predictable for Maintenance Teams
Imagine the frustration of running a newly developed building that is having chronic issues with a system as critical as elevators. The chorus of frustrated tenants is getting louder as more and more of them experience being trapped in the elevator during emergency shutdowns. Worse, the only explanation the tenants have received so far is that management is “working on it.” The truth is that management doesn’t know why this keeps happening and they are scrambling to find the engineer, vendor, or consultant who can get to the bottom of it.
In the interim, the operations manager decides to increase the cadence of maintenance on the elevators to try and head off the next shutdown. Unfortunately, the truth is that only 20% of equipment failures are age related. The other 80% of equipment failures are totally random events caused by a defect somewhere in the manufacturing, transportation, assembly, installation, or operations of the equipment.
Just because 80% of equipment failures occur at random does not mean that there is no reason for failure. It means that you cannot predict when there will be a failure based only on the age of the equipment or the most recent maintenance. It also doesn’t mean that it’s unavoidable with the right strategies in place.
The first step is to predict when random failures will occur based on when performance begins to drop. The next step is to put an effective maintenance plan in place to focus on value-add activities when there are competing priorities. The final step is to put a process in place for uncovering the root cause of equipment defects to address the vast majority of equipment failures.
Detecting when the performance of equipment is going to fail is getting easier and easier. Not long ago, operators would have to use thermographic cameras to determine temperature, stethoscopes to hear noise, bearing vibration detectors to note changes in vibration, and all sorts of other methods.
Despite being manual, tedious and time consuming, these strategies were less expensive than deploying sensors on each piece of equipment. Unfortunately, most of the time, it actually made more economic sense to not care about the condition of equipment, perform preventative maintenance on a time-based schedule, and live with the results.
Simply put, times have changed. The cost of Internet of Things (IoT) sensors have dropped by two thirds in the last 10 years, and are expected to continue falling.
This means that building operators can be creative when it comes to the types and amount of data that they want to collect about their systems. This data can affordably be captured on a continuous basis, stored in a cloud-based system, and fed into algorithms that get more accurate as the data set grows.
This makes detecting when performance begins to drop easy and affordable. But this is just the first step.
Without performing maintenance on the conditions of equipment, only the 20% of failures that are due to the age of equipment will be avoided. In addition, if you are replacing components based on a schedule, it’s possible that those components still have a long life and the activity is a waste of money, time and effort.
On the other hand, preventative maintenance is at least predictable and there are ways to make it easy. If the schedule can be followed without too much deferred/reactive maintenance, operators know what to expect in terms of their responsibilities and workload.
Relying solely on conditions-based maintenance is not an effective strategy. A company that fully commits to conditions-based maintenance can become reactive to newly identified problems in equipment. Performance changes are highly unlikely to happen smoothly over a work week. If each case is reacted to equally, there is likely to be a maintenance backlog as bad as if no monitoring had been implemented.
The dilemma is that maintenance teams are already busy. For every new issue detected and addressed, there is an opportunity cost for planned maintenance elsewhere.
The best way to handle this is by coupling sensor data with maintenance work order data to get a historical benchmark of the true costs associated with maintenance, repairs, and third-party vendor calls. By trending the cost of maintaining each piece of equipment, dynamic prioritization can take place. This may mean knowing that some equipment is on the verge of failing and allowing that to happen in order to perform a more value-add activity elsewhere.
Conditions-based maintenance is already being implemented by the best run portfolios to streamline manual processes and avert unpredictable equipment failures. However, the strategy is only expected to save 8–12% of the costs over a well-designed preventative maintenance program. The main thing that it does is to tell operators that there is a problem in time to deal with it in a low-cost way. It does not stop the problem, or degradation, from happening.
In addition to identifying the potential failure point, you also need a process that finds and removes the cause of random failures. Random failures are stress induced by specific acts and events. Unless you remove the cause of the incident, it will happen again and again. This explains why there is so much ‘fire-fighting’ by maintenance crews. Doing maintenance does not fix problems, it can only rejuvenate equipment.
The process of identifying the cause of failures is called root cause analysis. As mentioned earlier, these defects could have occurred at any point in the equipment’s life and may surface early in the equipment’s life or after years of operations.
Today, root cause analysis for complex equipment such as chiller or boiler plants must still be performed by engineers. However, with continuous and historical data of performance, algorithms are beginning identify patterns and assist engineers in focusing their efforts on the most likely causes. As more data is collected and faults are mapped to specific patterns of failure, it’s possible that software will not only be able to identify that performance has dropped, but to pinpoint the root cause of the failure.