How Machine Learning Will Tame the Explosion of Unstructured Data

 

I found the section about using machine learning to add structure to be the most impactful. Getting the system to sift through the data and find items interest rather than just being a database to house information would allow businesses to be more efficient in identifying and acting on what is relevant.

Let's set the stage here for the most successful system I've set up in the past of this type. The situation is that the plant was making cast pipe for municipal plants. Downtime was about 10-20% of production compared to industry averages of around 1%. Scrap rates were running around 5-6% compared to industry averages of around 3%. Something had to be done but mostly the improvement meetings consisted of sitting around and pointing fingers. QC was already tracking defects into a SQL database. Simply pulling the data into Pareto charts made a major difference in discussing defects. Next we cataloged all the different ways each production machine could be down by sifting through about a month of production and maintenance data by hand. Then we came up with a set of machine-driven criteria that was basically a machine going through a sequence of operations. Basically to run through a cycle, the machine had to do A, B, C, in that order. If we saw say A, B, A, C, then we knew somebody was working on it or calibrating it or doing anything other than processing product. The key was the sequence had to be something that a PLC could detect and it was preferred to use outputs and existing sensors necessary for production because we knew those would always be maintained unlike some extra add on counter. Each time this happened we logged a time stamp. It was a trivial effort to get the database to produce say pieces per hour but more importantly we looked at the time interval from piece to piece. If it exceeded a threshold then we kicked it out as downtime on a report that everyone in the facility had access to. We had the foreman go in and click on a short list of downtime reasons and we also had a comment field. This became the production downtime reports. Because of the pick list, we could Pareto chart the downtime data just like the QC data. Patterns emerged. We started to concentrate on downtime causes and we could actually measure it very easily. The finger pointing stopped because we were all working with data (information) and not just arguing about whose fault it was. In six months time we went from 10-20% downtime to times where we would go a week with maybe an hour total (<1%). Defect rates dropped to 1-2% which was the best in the industry and were still trending down when I left. Efficiency by any measurement was the highest in the country. Maintenance and production were working from the same data, making decisions by consensus rather than the usual production vs. maintenance arguments. Once the maintenance work orders were on a similar system, we started to track similar things in the maintenance system too which also helped it tremendously. That was until corporate decided to bring in a management inefficiency consultant that took us back to doing spreadsheets and pointing fingers again, and chained the foremen to their desks instead of getting out of the offices and supporting the people they were supposed to be supervising, but that's another story.

That is the promise of all the software systems out there. This is based on using software from 15-20 years ago. Things I've learned:

1. Log everything into a general purpose database. I'll get into why the various MES, ERP, and historian software programs don't cut it in a minute.

2. Get rid of the darned spreadsheets. These things are like a disease. They show up everywhere. You can't filter, sort, and process the data in them. They are an utter pain to try to "link" anything and always breaking down. They are OK for doing some what-if's and as a sort of scratch pad tool but NOT as a reporting and database system, period. Not even as a front end for viewing your data because as soon as you do that, the temptation is going to be to use the spreadsheets for everything again.

3. The data needs to be accessible to everyone. If it's locked up, it's useless.

4. Think about how to record the data you will need and then do that. Getting the data prepared for logging is done at the lowest level (plant hardware) using EXISTING sensors and outputs. Record both...sensors fail. Continuous logging such as temperatures and flows is good for continuous processes but not discrete ones where you need discrete records. The difference here by the way is say on lollipop machines that make thousands per hour...that's a continuous process, compared to say an automotive assembly plant where it makes dozens per hour so you record the data per piece rather than per unit of time. Think again how to use the data to get useful information with common metrics such as down time. Automatically record what you can. If someone is hand entering data, keep the number of entries down to no more than about 10 per shift or you will have lots of problems getting it done, and the data might not be very good if someone tries to memorize it for later trying to save time for the arduous manual entry labor. Typically for production foreman if they have to record anything, get the task down to under 5 minutes per shift.

5. EVERYONE top to bottom needs to be able to see the data. Once we let the cat out of the bag, the president of the company was going in and looking at our data once a week. The plant manager was checking it several times a day. Everyone in the plant knew when we were having a good day and when it was bad, and they knew why, in real time. You'd think that say department B doesn't need to see department A's data but that's not true. When department B runs out of material or suddenly sees a huge surge, they'll look at upstream department A to see what's going on, and department C down the line is also watching and maybe scheduling an early lunch based on what's going on. Everyone is working together and that's the point of these systems in the end.

6. When you code downtime if you do what I did, here's a quick lesson. Two categories you need on your pick list are "upstream down (no material)" and "downstream down (full)". We want to Pareto why process step X is having downtime and technically upstream and downstream issues are NOT downtime for that machine. However if those hit the top of the Pareto chart what it tells us is to quit looking at this machine. We need to focus more time upstream or downstream. Second, the causes need to be specific. As an example when we started the exercise in one area we didn't have a good list of causes. The initial list was almost literally "upstream, downstream, mechanical problem, electrical problem, production problem". Since production supervisors were coding the data, guess what rarely showed up as a cause? Second what you saw in the planning meetings was the electrical department, mechanical department, and production department all pointing fingers and assigning blame for the downtime. As soon as we took this away and put down things like "pump not working", we stopped discussing whose fault it was and started discussing why the pump stopped working which is after all the point...we moved on past the blame game.

For the same reason when maintenance systems first get set up, often they lack the "cause/code" type stuff or what goes in there simply isn't useful. It takes a little effort to sift though whatever is already being written down and what's in existing work orders to put in a list of legitimate causes/codes but it makes charting the issues far easier later. And sorry but there are just too many words, terms, and general noise for any "machine learning" system to ever produce useful data out of free text blanks even if you have something like a Google-style map-reduce search engine.

Speaking from experience setting these things up, most maintenance (work order) databases from a HISTORY point of view tend to resemble QC databases...you have a huge pile of data. Although you can sometimes get very creative with keyword searches and search engines are getting better all the time, like Google, there's only so much you can do with truly unstructured data. From your point of view if you can somehow code or structure your data going in, that makes it absolutely trivial to do a search later. The big problem being that maintenance software programs generally stink at doing this very well. So if the database is actually stored in a nonproprietary database or it gives you access to a true SQL interface you can quickly sift through the data to find what you are looking for, if it's possible. SQL is a bit intimidating for newbies and veterans who haven't worked with databases alike because the language was designed back in the 1970's and as such it doesn't "look" like the kind of language you're used to and it also works backwards...instead of everything being very procedural, the best way to use it is to more or less think of it like a sieve...you set up what you want to look for and let the database do it for you. I've tried the "automatic feature finder" type stuff. Just like automatic vibration analysis software, it just doesn't quite get the job done.

In a similar way, there is now a lot of "MES" or historian type software for the automation systems out there. Again it's really powerful and can generate tons of useful data IF you set it up right. The software vendors will advertise things like the ability to collect and store thousands of data points per second. They will encourage you to simply "log everything". In fact that's exactly what historian software actually is, and why it is just about the most useless thing ever invented compared to general purpose databases. Historian software is really, really good at processing thousands of data points in a time series and storing it. It is generally pretty good at retrieving it too and can automatically do things like interpolate missing data points or average/sum/max/min data over time intervals if you store data at say once per second but retrieve it at say once per hour intervals. It does absolutely awesome in a process plant where they live, sleep, eat, and breath based on trend charts. It sucks in a discrete manufacturing plant and it doesn't even do that good when trying to answer maintenance questions that are anything other than things like "did the level get too high in the last week?"

Historians have a huge disadvantage. Like the maintenance and QC databases what you end up with is a huge pile of raw data but a total lack of information. It's actually worse if you don't have a general purpose system unlike the QC and maintenance histories because the historian can only handle time series. Ask it anything else and it becomes an utter bear to deal with. It's not stored in any way that could ever be useful to you for statistical purposes. For example let's say that we simply record the outputs from a valve positioner so that we know every time the valve opens or closes. Fair enough. So I can almost always pull up a history chart with this software and be able to tell at a certain date and time when the valve was open or closed. But let's say that I want to know how MANY times it opened or closed over say a week's time. That might give me an indication of why a particular valve is wearing out so often. Fair enough. The trick is that in this case we have to somehow extract out of the database time stamps of say every time the valve changed states (open, closed), count them, and divide by two. With the common historian database what we get out is a huge pile of data that gives us the valve state say every second. So then with say a SQL query or maybe some other language first we have to sort through it for only the time stamps where going from time (t-1) to (t), it goes from say a 0 to a 1. This is done with something called a correlated subquery in SQL, or can easily be done in various procedural languages (C, VBA, even creative functions in Excel). This function is NOT available in historian software because it's not a time series so you have to process the data elsewhere. Once this is done it's a simple matter of counting the events or counting them up over time intervals which again is pretty easily done with software other than historian software. Or if someone thought about it and wrote a "counter" into the PLC software in the first place, you can trivially get the "stroke counts" at various time intervals, something the historian software can easily do in under a second. But the raw data stream approach was an utter failure in this case. Second example sticking with valves, let's say what we really want to know is when the valve is sticking and to make it even more complicated, we want to look at how long the valves sit between movements to see if it's simply seizing up either due to material buildup or corrosion or lubrication issues. Fair enough. So again we only have time series of data. So in this case we have to again employ some other software to first locate when the output commands are going to the valves and then find the time interval before the valve indicates it has changed to the new state. There is another problem here that often arises...in the "log everything" approach many times people that set up the software log only the system INPUTS and not the OUTPUTS or vice versa figuring we might for instance want to know temperatures and flow rates but fail to log anything about the actuators so we don't know what was happening at the time except indirectly. The historian software at this point is totally useless because none of it that I've worked with can correlate two different events. Again it could get a whole lot better in some ways if someone thinks it out ahead of time and adds a tiny bit of code to a PLC (a couple timers) and logs those values along with the valve stroke event. At that point instead of storing data about a valve position once a second, and the valve actuator output, and the timers (time to close, time to open), it's much easier to set up an event recorder into a general purpose database that simply records a record of the same data say every time the valve changes state. The amount of data to be logged if the valve strokes once a minute is now 1/60th of the "once per second" approach and contrary to the advertising, general purpose databases although not designed for this can trivially and easily keep up with the load, and have all kinds of already built tools that were intended for doing things like sifting through accounting and sales records. And it's available today, not some time in the future. And everyone already has an army of IT people that know how to maintain these types of databases and know how to pull the data. There are online courses (w3schools) that are free for anyone that takes the time to learn it.

The counter argument of course is what if you didn't log XYZ then that data is gone forever and that's true of ANY system. I've set these systems up for years and there is a good and a bad way to do it.

 

Add Reply

×
×
×
×