Statistical Thinking – biopm, llc https://biopmllc.com Improving Knowledge Worker Productivity Sun, 13 Dec 2020 20:09:02 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://biopmllc.com/wp-content/uploads/2024/07/cropped-biopm_512w-32x32.png Statistical Thinking – biopm, llc https://biopmllc.com 32 32 193347359 The Practical Value of a Statistical Method https://biopmllc.com/strategy/the-practical-value-of-a-statistical-method/ Tue, 01 Dec 2020 03:58:19 +0000 https://biopmllc.com/?p=1227 Continue reading The Practical Value of a Statistical Method]]> Shortly after I wrote my last blog “On Statistics as a Method of Problem Solving,” I received the latest issue of Quality Progress, the official publication by the American Society for Quality.   A Statistics article “Making the Cut – Critical values for Pareto comparisons remove statistical subjectivity” caught my attention because Pareto analysis is one of my favorite tools in continuous improvement.

It was written by two professors “with more than 70 years of combined experience in the quality arena and the use of Pareto charts in various disciplines” and covers a brief history of Pareto analysis and its use in quality to differentiate the vital few causes from the trivial many.

The authors introduced a statistical method to address the issue of “practitioners who collect data, construct a Pareto chart and subjectively identify the vital few categories on which to focus.”  The main point is that two adjacent categories sorted by occurrence in a descending order may not be statistically different in terms of their underlying frequency (e.g. rate of failure) due to sampling error.  

Based on hypothesis testing, the method includes two simple tools:

  1. Critical values below which the lower occurrence category is deemed significantly different from the higher one
  2. A p-value for each pair of occurrence observations of the adjacent categories to measure the significance in the difference

With a real data set (published by different authors) as an example, they showed that only some adjacent categories are significantly different and therefore, are candidates for making the cut.

I see the value in raising the awareness of statistical thinking in decision making (which is desperately needed in science and industry).  However, in practice, the method is far less useful than it appears and can lead to improper applications of statistical methods.

Here are but a few reasons.

  • The purpose of Pareto charts is for exploratory analysis, not for binary decision-making, i.e. making the cut which categories belong to the vital few.  As a data visualization tool, a Pareto chart shows, overall, whether there is a Pareto effect – an obvious 80/20 distribution in the data not only indicates an opportunity to apply the Pareto principle but also gives the insight in the nature of the underlying cause system.  
  • Using the hypothesis test to answer an unnecessary question is waste.  Overall, if the Pareto effect is strong, the decision is obvious, and the hypothesis test to distinguish between categories is not needed.  If the overall effect is not strong enough to make the obvious decision, the categorization method used is not effective in prioritization, and therefore, other approaches should be considered.  
  • Prioritization decisions depend on resources and other considerations, not category occurrence ranking alone.  This is true even if the Pareto effect is strong.  People making prioritization decisions based solely on Pareto analysis are making a management mistake that cannot be overcome by statistical methods. 
  • The result of the hypothesis test offers no incremental value – it does not change the decisions made without such tests.  For example, if the fourth ranking category is found not statistically different from the third and there are only enough resources to work on three categories, what should the decision be? How would the hypothesis test improve our decision? Equally unhelpful, a test result of significant difference merely confirms our decision. 
  • The claim of “removing subjectivity” by using the hypothesis test is misleading.  The decision in any hypothesis test depends on the risk tolerance of the decision maker, i.e. the alpha (or significance level) used to make the decision whether a given p-value is significant is chosen subjectively.  The choice of a categorization method also depends on subject matter expertise – another subjective factor.  For example, two categories could have been defined as one.  In addition, many decisions in a statistical analysis involve some degrees of expert judgment and therefore introduce subjectivity.  Such decisions may include whether the data is a probability sample, whether the data can be modeled as binomial, whether the process that generated the data was stable, etc.  

Without sufficient understanding of statistical theory and practical knowledge in its applications, one can easily be overwhelmed by statistical methods presented by the “experts.”  Before considering a statistical method, ask the question “how much can it practically improve my decision?”  In addition, “One must never forget the importance of subject matter.” (Deming)

]]>
1227
On Statistics as a Method of Problem Solving https://biopmllc.com/strategy/on-statistics-as-a-method-of-problem-solving/ https://biopmllc.com/strategy/on-statistics-as-a-method-of-problem-solving/#comments Sun, 01 Nov 2020 03:55:59 +0000 https://biopmllc.com/?p=1220 Continue reading On Statistics as a Method of Problem Solving]]> If you have taken a class in statistics, whether in college or as a part of professional training, how much has it helped you solve problems?

Based on my observation, the answer is mostly not much. 

The primary reason is that most people are never taught statistics properly.   Terms like null hypothesis and p-value just don’t make intuitive sense, and statistical concepts are rarely presented in the context of scientific problem solving. 

In the era of Big Data, machine learning, and artificial intelligence, one would expect improved statistical thinking and skills in science and industry.  However, the teaching and practice of statistical theory and methods remain poor – probably no better than when W. E. Deming wrote his 1975 article “On Probability As a Basis For Action.” 

I have witnessed many incorrect practices in teaching and application of statistical concepts and tools.  There are mistakes unknowingly made by users inadequately trained in statistical methods, for example, failing to meet the assumptions of a method or not considering the impact of the sample size (or statistical power).  The lack of technical knowledge can be improved by continued learning of the theory.

The bigger problem I see is that statistical tools are used for the wrong purpose or the wrong question by people who are supposed to know what they are doing — the professionals.  To the less sophisticated viewers, the statistical procedures used by those professionals look proper or even impressive.  To most viewers, if the method, logic, or conclusion doesn’t make sense, it must be due to their lack of understanding.  

An example of using statistics for the wrong purpose is p-hacking – a common practice to manipulate the experiment or analysis to make the p-value the desired value, and therefore, support the conclusion.

Not all bad practices are as easily detectable as p-hacking.  They often use statistical concepts and tools for the wrong question.  One category of such examples is failing to differentiate enumerative and analytic problems, a concept that Deming wrote extensively in his work, including the article mentioned above.  I also touched on this in my blog Understanding Process Capability.

In my opinion, the underlying issue using statistics to answer the wrong questions is the gap between subject matter experts who try to solve problems but lack adequate understanding of probability theory, and statisticians who understand the theory but do not have experience solving real-world scientific or business problems.   

Here is an example. A well-known statistical software company provides a “decision making with data” training.  Their example of using a hypothesis test is to evaluate if a process is on target after some improvement.  They make the null hypothesis as the process mean equal to the desired target.  

The instructors explain that “the null hypothesis is the default decision” and “the null is true unless our data tell us otherwise.” Why would anyone collect data and perform statistical analysis if they already believe that the process is on target?  If you are statistically savvy, you will recognize that you can reject any hypothesis by collecting a large enough sample. In this case, you will eventually conclude that the process is not on target.

The instructors further explain “It might seem counterintuitive, but you conduct this analysis to test that the process is not on target. That is, you are testing that the changes are not sufficient to bring the process to target.” It is counterintuitive because the decision maker’s natural question after the improvement is “does the process hit the target” not “does the process not hit the target?”

The reason I suppose for choosing such a counterintuitive null hypothesis here is that it’s convenient to formulate the null hypothesis by setting the process mean to a known value and then calculate the probability of observing the data collected (i.e. sample) from this hypothetical process.  

What’s really needed in this problem is not statistical methods, but scientific methods of knowledge acquisition. We have to help decision makers understand the right questions. 

The right question in this example is not “does the process hit the target?” which is another example of process improvement goal setting based on desirability, not a specific opportunity. [See my blog Achieving Improvement for more discussion.]  

The right question should be “do the observations fall where we expect them to be, based on our knowledge of the change made?”  This “where” is the range of values estimated based on our understanding of the change BEFORE we collect the data, which is part of the Plan of the Plan-Do-Study-Act or Plan-Do-Check-Act (PDSA or PDCA) cycle of scientific knowledge acquisition and continuous improvement.   

If we cannot estimate this range with its associated probability density, then we don’t know enough of our change and its impact on the process.  In other words, we are just messing around without using a scientific method.  No application of statistical tools can help – they are just window dressing.

With the right question asked, a hypothesis test is unnecessary, and there is no false hope that the process will hit the desired target.  We will improve our knowledge based on how well the observations match our expected or predicted range (i.e. Study/Check).   We will continue to improve based on specific opportunities generated with our new knowledge.

What is your experience in scientific problem solving?

]]>
https://biopmllc.com/strategy/on-statistics-as-a-method-of-problem-solving/feed/ 1 1220
Understanding Process Capability https://biopmllc.com/operations/understanding-process-capability/ https://biopmllc.com/operations/understanding-process-capability/#comments Sat, 01 Aug 2020 03:17:57 +0000 https://biopmllc.com/?p=1196 Continue reading Understanding Process Capability]]> Process capability is a key concept in Quality and Continuous Improvement (CI).  For people not familiar with the concept, process capability is a process’s ability to consistently produce product that meets the customer requirements.

Conceptually, process capability is simple.  If a process makes products that meet the customer requirements all the time (i.e. 100%), it has a high process capability.  If the process does it only 80% of the time, it is not very capable.

For quality attributes measured as continuous or variable data, many organizations use Process Capability Index (Cpk) or Process Performance Index (Ppk) as the metric for evaluation.  In my consulting work, I often observe confusion and mistakes applying the concept and associated tools, even by Quality and CI professionals.  For example,

  • Mix-up of Cpk and Ppk
  • Unclear whether or when process stability is a prerequisite
  • Using the wrong data (sampling) or calculation
  • Misinterpretation of process capability results
  • Difficulty evaluating processes with non-normal data, discrete data, or binary outcomes

The root cause of this gap between this simple concept and its effective application in the real world, in my opinion, is lack of fundamental understanding of statistics by the practitioners.

Statistics

First, a process capability metric, such as Cpk, is a statistic (which is, by definition, simply a function of data).  The function is typically given as a mathematical formula.  For example, mean (or the arithmetic average) is a statistic and is the sum of all values divided by the number of values in the data set.   

The confusion between Cpk and Ppk often comes from their apparently identical formulas, with the only difference being the standard deviation used.  Cpk uses the within-subgroup variation, whereas Ppk uses the overall variation in the data.  Which index should one use in each situation?

It is important to understand that any function of data can be a statistic – whether it has any useful meaning is a different thing.  The formula itself of a statistic does not produce the meaning.  Plugging whatever existing data into a formula rarely gives the answer we want. 

To derive useful meaning from a statistic, we must first define our question or purpose and state assumptions and constraints.  Then we can identify the best statistic, gather suitable data, calculate and interpret the result. 

Enumerative and Analytic Studies

Enumerative and analytic studies1 have two distinct purposes. 

  • An enumerative (or descriptive) study is aimed to estimate some quantity in the population of interest, for example, how many defective parts are in this particular lot of product? 
  • An analytic (or comparative) study tries to understand the underlying cause-system or process that generates the result, for example, why does the process generate so many defective parts?

If the goal is to decide if a particular lot of product should be accepted or rejected based on the number of defective parts, then it is appropriate to conduct an enumerative study, e.g. estimating the proportion of defectives based on inspection of a sample from the lot.  A relevant consideration is sample size vs. economic cost – more precise estimates require larger samples and therefore cost more.  In fact, a 100% inspection will give us a definite answer.  In this case, we are not concerned with why there are so many defectives, just how many.

If the goal is to determine if a process is able to produce a new lot of product at a specified quality level, it is an analytic problem because we first have to understand why (i.e. under what conditions) it produces more or fewer defectives.  Methods used in enumerative studies are inadequate to answer this question even if we measured all parts produced so far.  In contrast, control charts are a powerful analytic method that uses carefully designed samples (rational subgroups) over time to isolate the sources of variation in the process, i.e. understanding the underlying causes of the process outcome.  This understanding allows us to determine if the process is capable or needs improvement.

Cpk versus Ppk

If our goal is to understand the performance of the process in a specific period (i.e. an enumerative study), we are only concerned with the products already made, not the inherent, potential capability of the process to produce quality products in the future.  In this case, demonstration of process stability (by using control charts) is not required, and Ppk using a standard deviation that represents the overall variability from the period is appropriate.  

If our goal is to plan for production, which involves estimating product quality in future lots, the process capability analysis is an analytic study.  Because we cannot predict what a process will produce with confidence if it is not stable, demonstration of process stability is required before estimating process capability. 

If the process is stable, there is no difference between within-subgroup variation (which is used for Cpk) and overall variation (which is used for Ppk), except estimation errors. Therefore, Cpk and Ppk are equivalent.

If the process is not stable, the overall standard deviation is greater than the within-subgroup variation — Ppk is less than Cpk as expected.  However, Ppk is not a reliable measure of future performance because an unstable process is unpredictable.  If (a big IF) the subgroup is properly designed, the within-subgroup variation is stable and Cpk can be interpreted as the potential process capability if all special causes are eliminated.  In practice, the subgroup is often not designed or specified thoughtfully, making interpreting Cpk difficult.

In summary, process capability analysis requires good understanding of statistical concepts and clearly defined goals.  Interested practitioners should peruse many books and articles on this topic.  I hope the brief discussion here helps clarify some concepts. 

1. Deming included a chapter “Distinction between Enumerative and Analytic Studies” in his book Some Theory of Sampling (1950).

]]>
https://biopmllc.com/operations/understanding-process-capability/feed/ 2 1196
What Does the Data Tell Us? https://biopmllc.com/analytics/what-does-the-data-tell-us/ Wed, 01 Apr 2020 03:09:14 +0000 https://biopmllc.com/?p=1137 Continue reading What Does the Data Tell Us?]]> It’s March 31, 2020.  In the past 3 months, the novel coronavirus (COVID-19) has changed the world we live in.  As the virus spreads around the globe, everyone is anxiously watching the latest statistics on confirmed cases and deaths attributed to the disease in various regions.  With the latest technology, timely data is more accessible to the public than ever before.  

With the availability of data comes the challenge of proper comprehension and communication of it.  I am not talking about advanced data analytics or visualization but communication and interpretation of simple numbers, counts, ratios, and percentages.    

The COVID-19 pandemic has provided us ample examples of such data.   If not careful, even simple data can be misinterpreted and lead to incorrect conclusions or actions. 

Cumulative counts (or totals) never go down. They are monotonously increasing.  The total confirmed cases always increase over time even when the daily new cases are dropping.  The total is not most effective in communicating trends, unless we compare it with some established models.  The change in daily cases can give a better insight of the progress.

Even the daily change should be interpreted with caution.  A jump or drop in new cases on any single day may not mean much because of chance variation inherent in data collection.  It is more reliable to fit the data to a model over a number of days to understand the trend.

The range of a dataset gets bigger as more data is collected.  Even extreme values that occur infrequently will show up if the sample size is large.  Younger people are less likely to have severe symptoms if infected by the virus.  The initial data on hospitalization or mortality show predominantly older patients, the most vulnerable population.  As more cases are collected, the patient age range will naturally expand to include very young patients who need hospitalization or even die.  But this increase in the number of younger patients does not necessarily mean that the virus has become deadlier for the younger population.

The percentage of hospitalized patients who are under 65 years of age is by itself not a right measure of the disease risk to the younger population.  There are significantly more people younger than 65 than those older in a general population.  Each person’s risk should be adjusted by the size of the age group.  In addition, the severity of each hospitalized patient is different and their pre-existing health conditions also play a critical role in their recovery or survival.

Mortality is the ratio of the number of the deceased to the number of confirmed cases.  The numerator is likely more accurate than the denominator.  It is likely most patients who died of COVID-19 related complications are counted, whereas the confirmed cases represent mainly those infected people who have severe symptoms, which is known to be the minority.  Therefore, the calculated mortality is likely an overestimate at the initial stage of the pandemic when the prevalence of the disease is uncertain.

In the above examples, it only takes some awareness to avoid data misinterpretation.  For critical decisions, we must understand the context of the data, e.g. where the data came from, what data is collected, how it is collected, what data is missing, etc.

We should never forget that the data we often see is collected from a sample of the population we try to understand.  Any statistic (or calculation) from the sample data, such as count or average, is not of most interest.  What we truly want to know is some population attribute estimated based on the sample data.  We cannot measure the entire population, e.g. test everyone to see who are infected, and have to rely on sample data available to us.  Different samples can give drastically different data.  We must understand what that sample is and how it is selected in order to infer from the data.

For example, the sample may not be representative of the population.  The people who have been tested for the new coronavirus represent a sample.  But if only seriously ill people are tested, they do not represent the general population if we want to understand how deadly the virus is.

Equally important is the method of measurement.  All tests have errors.  An infected person could give a negative test result (i.e. a false negative), and an uninfected person could give a positive result (i.e. false positive).  The probabilities of such errors depend on the test.  Different tests on the same people can give different results. 

To analyze data properly, trained professionals depend on probability theory and sophisticated methods.  For most people, though, it helps to know that what’s not in the data could be more important than the data.

]]>
1137
Setting SMART Goals https://biopmllc.com/strategy/setting-smart-goals/ https://biopmllc.com/strategy/setting-smart-goals/#comments Fri, 31 May 2019 01:24:02 +0000 https://biopmllc.com/?p=1067 Continue reading Setting SMART Goals]]> Recently I had conversations with several people on different occasions about effective goal setting.  It is a common practice to use Specific, Measurable, Achievable, Relevant, and Time-bound (SMART) as criteria to create goals.   However, using SMART goals for effective management or decision making is not as simple as it appears.

For example, “improve product ABC yield to 96% or more by September 30” can be a SMART goal.  In a non-manufacturing environment, a similar goal can be “reduce invoices with errors to 4% or less by September 30.”

Suppose now is September 30, and we have only 4% of the products or invoices classified as bad.  Did we achieve our goal?

Most people would say “Of course, we did.”  But the real answer is “We don’t know without additional information or assumptions.” 

Why?

The reason is that the 4% is calculated from a sample, or limited observations from the system or process we are evaluating.  The true process capability may be higher or lower than 4%.   

We can use a statistical approach to illustrate the phenomenon.  Since the outcome of each item is binary (good/bad or with/without errors), we can model the process as a binomial distribution.   Figure 1 shows the probability of observing 0 to 15 bad items if we examine a sample of 100 items, assuming that any item from the process has a 4% probability of being bad.

Figure 1: binomial probability=0.04
Figure 1: Binomial Distribution (n=100, p=0.04)

When the true probability is 4%, we expect to see 4 bad items per 100, on average.  However, each sample of 100 items is different due to randomness, and we can get any number of bad items, 0, 1, 2, etc.  If we add the probability values of the five leftmost bars (corresponding to 0, 1, 2, 3, and 4 bad items), the sum is close to 0.63.  This means that there is only a 63% chance of seeing 4 or fewer bad items in a sample of 100, when we know the process should produce only 4% bad items.  

More than 37% of the time, we will see 5 or more bad items in a sample of 100.  In fact, there is a greater than 10% chance seeing eight bad items — twice as many as expected!

In contrast, a worse-performing process with a true probability of 5% (Figure 2) has a 44% chance of producing 4 or fewer bad items.  This means that we will see it achieving the goal almost half the time.  

Figure 2: binomial probability=0.05
Figure 2: Binomial Distribution (n=100, p=0.05)

Suppose the first process represents your capability and the second one of your colleagues, how do you feel about using the SMART goal above as one criterion for raises or promotions?

The point I am making is not to abandon the SMART goals but to use them judiciously.  In many cases, it calls for statistical thinking – understanding variation in data.  Just because we can measure or quantify something doesn’t mean we are interpreting the data properly to make the right decision.  

It takes “some rudimentary knowledge of science”1 to be smart.


1. Deming, W. Edwards. Out of the Crisis : Quality, Productivity, and Competitive Position. Cambridge, Mass.: Massachusetts Institute of Technology, Center for Advanced Engineering Study, 1986.

]]>
https://biopmllc.com/strategy/setting-smart-goals/feed/ 1 1067