For years before I came to Drishti, I worked in finance and trading, a field where literally every decision is driven by numbers and data. With data driving the entire industry, the quality of data was critical for professional application. Early in my career, I learned that in some data sources, errors and outliers are common.
When looking at a price series, for example, it is worth understanding if a 1% price move in a minute is real or a data entry error. Or if a trade of 100,000 shares for a stock with typical trade sizes of 1,000 shares is the result of a fat finger error. It is critical to analyze and understand those outliers to see not only if they are real, but also how useful they are to keep in the data. One may throw away a single large trade, but to discard the flash crash of 2010 would be a critical mistake.
Where you have data, you have outliers, and they can tell an important story. In manufacturing, outliers have traditionally been ignored. If your cycle time across a 30-day period falls within a range of 41-43 seconds, you’ll probably dismiss that one Wednesday when it was 58 seconds or the Friday morning when it was 37 seconds. Why? Because it isn’t indicative of the typical performance on your line.
Perhaps the operator was distracted, or highly motivated to get out of there early on a Friday, thus skewing your numbers from the norm. You may consider that odd cycle time to be an outlier – but the last time I checked, Fridays come every week.
So If you are dismissing most Fridays, you may be dismissing the possibility that learning from those Fridays can increase productivity and decrease cycle times. This becomes even more likely when the norm is based on only a handful of samples. Incorporating that data into your measurements and your planning allows you to have a picture that better reflects your line, even if upon first glance, it appears to be an anomaly.
Outliers beyond the normal distribution
A lot of numerical data from real world sources follows the normal or Gaussian distribution, also known as the bell curve. You may have some outlying data points, but you can reasonably anticipate that your outcomes will lie within a range based on the standard deviation (typically 3σ). The assumption when faced with a new data set is that it is normally distributed.
However, the normal distribution is not all that “normal.” There are many instances where your data follows a fat-tailed distribution like a log-normal or a power law. In financial time- series, the fat-tailed nature is seen in market crashes which occur far more frequently than a normal distribution would predict. And we have all heard of the Pareto principle:
80% of the wealth is owned by 20% of the population.
Long tails can be found in online retail, hospital discharge times, lifetimes of electronic components and many other situations. The most important thing to note about fat-tailed distributions is that events which are much larger than the mean happen much more frequently than we would expect, based on a normal world.
In manufacturing, such long tails may arise due to variance caused by many reasons: different operators, different times of day, incidental actions, etc. What’s important is that if you don’t have enough data-points to tell you what the distribution is, and you’re therefore incorrectly assuming the normal curve, you may be throwing out data-points that look like outliers in one scenario, but aren’t in another.
Continuous vs. episodic measurement
The only way to understand whether or not your outlier truly is an outlier is with more data, which I spoke of in an earlier post. Traditionally, manufacturers have used time and motion studies, which are a form of episodic (also biased, incomplete and outdated) measurement. With episodic measurement, you run an unavoidable risk of completely missing patterns simply because you weren’t taking measurements at the time an event occurred. And that could leave you feeling like you have an outlier on your hands rather than an indicator of a trend that, when plotted with thousands of other data-points gathered from continuous measurement, demonstrates a clear pattern.
Drishti enables companies to use continuous measurement, which drives a distinct advantage; with it, you can look back at any period you want. So no matter what the problem you’re trying to solve – cycle time, defect rate, etc. – you have the data to help you understand what’s happening on the floor at every second of the day right at your fingertips.
Learn more about the drawbacks of episodic measurement techniques in this IndustryWeek article authored by Prasad Akella, Drishti founder and CEO.
Sameer Gupta is Drishti’s head of data.