I was recently onsite for a data discovery exercise, unit has one of the largest single location manufacturing capacity in its sector. Of a lot of data sets we looked at, one of the interesting case that came in front of us was that of a vibration of a Fan, one that is very important in the entire process, lowering of the operational RPM could result in significant production loss.
We took past data sets and wanted to understand how the retrospective assessment is done by the team, as expected a lot of time went into fact finding and was dependent on a lot of people. Besides taking time, no one could point out exactly when the issue started building up and when would have been the right time to respond to it?
What did our algorithms (series of logics, no ML really) find out?
1. Total of 1953 peaks happened, where in the rise in vibration % was such that if continued it could have mean an X% increase over 24 hours.
2. 1606 cases where the peaks where in consecutive points the vibration increased by 50% of X%, we have called them as Alerts. (In the current scheme of things alarm only goes when things are out of control)
3. 823 out of 1606 cases had consecutive alerts, in quite of a few of them 4-5 alerts came in successively. (Remember these alerts are not simply a>b “raise alarm”, it tracks the tendency and past pattern)
4. There were 7 occasions (exact date and time pointed out) when plant had an unplanned shutdown (over 8 hours) and the problem could have been addressed. (Next time when that happens an maintenance team already has a ticket to address the issue)
5. Algorithm automatically pointed out how maintenance activity in one of the cases could normalize the increase in vibration %, while in the other they couldn’t or perhaps no action was taken. (So if a ticket is marked resolved and technically the problem stays, the algorithms points out it close to real time)
6. Because of last two tickets going un attended the unit lost out 7% production over a stretch of 5 weeks and had to wait for another unplanned shutdown to address the issue!
Point 1 to 6 happened even after people were looking at the screens 24X7! Time to have people taking actions and not looking at screens, real time monitoring is a thing of past, but to move to future the team needs to of adequate tools to do retrospective assessments and eventually go on to work on systems/tools that predict an anomaly building up well in advance!
Well that’s a real case study! Liked it? Would love to hear your views/thoughts!