Deliberate Mistakes: How Science is Invading Business

In 2006, HBR Ideacast (in its fourth podcast) interviewed HBR Senior Editor Gardiner Morse on an article he’d working on – concerning so-called “deliberate mistakes.”

I found the podcast very interesting – primarily because the idea it explored was something I’d covered extensively in my major.

Let me back up: Mr. Morse explains how over the past couple of decades businesses have embraced “experiments” to test if things will work. However, simply running experiments isn’t enough – a business also has to make “deliberate mistakes,” which reduces to running experiments you believe will fail. A problem with running experiments that confirm your experiments is that you become trapped by your assumptions – and you may end up “assuming” things that can cost your company millions of dollars.


The example that Paul Schoemaker and Robert Gunther begin their article with is AT&T:

Before the breakup of AT&T’s Bell System, U.S. telephone companies were required to offer service to every household in their regions, no matter how creditworthy. Throughout the United States, there were about 12 million new subscribers each year, with bad debts exceeding $450 million annually. To protect themselves against this credit risk and against equipment theft and abuse by customers, the companies were permitted by law to demand a security deposit from a small percentage of subscribers. Each Bell operating company developed its own complex statistical model for figuring out which customers posed the greatest risk and should therefore be charged a deposit. But the companies never really knew whether the models were right. They decided that the way to test them was to make a deliberate, multimillion-dollar mistake.

For almost a year, the companies asked for no deposit from nearly 100,000 new customers who were randomly selected from among those considered high risks. […] To the companies’ surprise, many of the presumed bad customers paid their bills fully and on time and did not steal or damage the phones. Armed with these new insights, Bell Labs helped the operating companies recalibrate their credit scoring models and institute a much smarter screening strategy, which added, on average, $137 million to the Bell System’s bottom line every year for the next decade. (emphasis added)

Continue article here.

Now, before you ask specific – why couldn’t AT&T retroactively identify these people; why wasn’t this data used in the creation of the formula (as opposed to credit score, income, neighborhood, etc; which Mr. Morse implied they used); why did they have to exclude people from the insurance plan to learn this? – the podcast didn’t going into that. Being the generous guy I am, I’m going to assume a “displacement of responsibility” effect – that is, charging the insurance fee eliminated the social obligation to not damage the equipment, thus charging the fee actually increased damaged equipment. It’s quite plausible, and fits the narrative better.

Enter Science

The state of experimentation in business is baldly stated:

Many managers recognize the value of experimentation, but they usually design experiments to confirm their initial assumptions.

This is where science was in the 1920s.

In the 1920s, the Vienna Circle was in full swing – refining and popularizing the philosophical doctrine of Logical Positivism, which quickly permeated science. The fundamental tenet of logical positivism is that everything can be derived from empirical data (e.g. experiments) and logical inference. It’s a rejection of both theology and metaphysics – “postulating” (or assuming) that reality works in a certain fashion without any direct evidence.

In the 1930s, Karl Popper popularized falsification. Falsification is a logical correction – it points that the “problem of induction” popularized by David Hume in 1748 means that it’s impossible to arrive at “true” knowledge by induction (going from evidence to theory). This is the “all swans are white” fallacy that Nicholas Taleb used to great effect in his book The Black Swan: no matter how many white swans you see, it doesn’t mean a black swan doesn’t exist. The logically valid way to proceed is to create a theory, then derive a hypothesis, then to test that hypothesis against the evidence. You only need one disconfirming example (e.g. a single black swan) to disprove the hypothesis; so by falsifying hypothesis you can have a “process of elimination” of hypotheses and theory.

Now, it turns out that (philosophically) there are a few problems with that way of progressing, one of which is known as the Duhem-Quine thesis (Quine has a nice, if overstated, explanation in Two Dogmas of Empiricismhe later partially retracted his conclusion). In short, it states that hypothesis cannot be isolated and tested individually – rather, you are testing a bundle of interconnected theses, which makes it very difficult to falsify any hypothesis. Another problem is underdetermination, which claims that that there are n possible (contradicting) theories for any finite amount of evidence, where n > 1 (and usually very large). Thus, (practically speaking) a company can have any amount of evidence and have multiple contradicting theories available – making it very difficult to choose which action to take where the theories contradict.

(On the bright side, the theories are going to overlap a lot, too – they need to agree on the empirical evidence, after all – so the more evidence you have, the better off you’ll be. It’s just not certain).

Practical Reasons

However, these problems are of little practical significance in business, where the goal is not to be “right” but to be “more correct then the next guy” – your competitors. Business is a relative thing, so relative increases in truth have real business value.

In fact, there is a more compelling rationale to use falsification in business than there is in science.

“Savvy executives” – to borrow HBR’s favorite phrase – are considerably more vulnerable than scientists to cognitive “traps” that psychologists have identified. Why? Because whereas pressure is on scientists to produce truth, executives are under pressure to make it work. And boy oh boy, is the list of cognitive biases a mile long. The most significant cognitive errors for executives are:

  1. Confirmation Bias: When someone considers a hypothesis or opinion, they reach back in their memory for instances that confirm the hypothesis – and suppress recollection of instances which disprove the hypothesis.
  2. Regression to the Mean: People tend to ignore probabilities, and focus on the most recent event. In probability, an exceptional event – e.g. very good (or very bad) returns – is likely to be followed by a more ordinary event. The classic example is for a flight instructor who swore that negative feedback works (it doesn’t) – his explanation was that every time a pilot did exceptionally badly and he yelled at them, they did better the next time. This was true – but after an exceptionally bad landing, a pilot is likely to regress to the mean (of his capabilities) and therefore do better. This also applies to, e.g. stock trading and revenue/sales performance (this is part of the reason why it’s very bad to base firing/compensation on just the last year of results).
  3. Hindsight Bias: The tendency to look at a previous event and believe that you saw it coming – and didn’t – but that gives you confidence in predicting future events.
  4. Overconfidence effect: For many questions, answers that people rate as being “99% certain” are wrong 40% of the time.
  5. Halo Effect: Judgment about one attribute spills over to other attributes… e.g. Google is doing really well, therefore everything they do is really good as well; or, people who are more attractive are better at their work.

There are – quite literally – hundreds more; but let me explain what these combine to in terms of falsification.

Obviously, confirmation bias, hindsight bias, and the overconfidence effect combine to make people very bad at predicting what’s going to happen next. Together, they mean that (i) people believe they’re good at understanding or forecasting the situation, and (ii) only remember information which conforms to their opinions.

The halo effect means that people will routinely focus on the wrong things – that is, they will take an indication of one thing as an indication of another, unrelated, thing. AT&T is an example – they assumed that credit risk indicated bad debt and equipment theft/damage – when it did not.

And “regression to the mean” means that people are likely to base their predictions on exceptions as opposed to the underlying reality – essentially, to overestimate the amount of control they can exert on the situation. (Incidentally, the “fundamental attribution error” – the tendency to ascribe people’s performance to their nature as opposed to their situation – is not unrelated).

The combination of (i) focusing on the wrong thing, (ii) being overconfident of their understanding, (iii) not questioning their opinions, and (iv) overestimating the amount of control they can exert is not a good combination.

The practice of falsifying hypothesis is good in two ways: first, it’s humbling to realize when you’re wrong. Second, it provides basic ammunition to hobble people’s initial false conclusions.

The twin benefit of providing both discipline and improved performance – is, well, quite compelling.

Inheriting Methodology

The interesting fact is that it seems that business is, slowly, picking up the methodology of science.

In some ways, business has had scientific ideas introduced backwards. Consider the idea of a “paradigm shift,” which entered the business lexicon in the 1990s (and was promptly over-used); Kuhn introduced paradigm shifts in science in 1962, nearly thirty years after Karl Popper spoke of falsification. Experimentation has gained considerable traction in the past decade, closely followed by “deliberate mistakes” as Popperian falsification followed the Vienna Circle.

The advance of scientific methodology is not limited to philosophical ideas – in 2009, for example, IBM acquired SPSS (Statistical Package for the Social Sciences), and is now selling it as a Business Intelligence tool. Wall St is famous for hiring statistics PhDs to mine stock data; Google has become well-known for hiring newly-minted PhDs and giving them room to find the best solution to a problem (as opposed to a “sufficient” solution – or the first one that works).

As business faces increasing competition, making sure you’re doing the right thing matters more and more – room for mistakes, or acting sub-optimally, is quickly disappearing in the modern, distributed, global marketplace. It will be interesting to see how business continues to adopt scientific methodologies to try and reduce the possibility of error (particularly recurrent, expensive error) in the future.

Post Revisions:

There are no revisions for this post.

  • loot

    This was an entertaining post. I think you will find that research programs at major universities are operated more like a business than the ideal model that you describe above. The ideal model exists in some places, your alma mater has people interested in truth, but in other places, it’s all about the publication lists.
    The group in charge of tenure promotion will see someone pumping out publications and won’t care that they suck, They just have to be better than the suck the other guy is pumping out. If someone stumbles upon truth that is a bonus byproduct of the system. Serendipity can get you Nobel prizes, and that is what keeps people working a degenerative research program.

    Here is an example of how I am helping a client now. I have to keep this generic. I represent a company that produces a part that is in every automobile made. This part varies by vehicle, so we have 1500 or so part numbers for this device to fit those applications. Some part numbers sell in greater quantities than others depending on region and time of year. I have a client that sells my product in 200 stores with each having room to stock 250 per store. Shipping is from a central warehouse once a week. The central warehouse can hold 10,000 total units from us. Which parts do they stock? How do I maximize profit for them and the company I represent? I give them stocking suggestions based on previous sales and national sales from my manufacturer. Most of the buyers I have met with have never seen a regression explained, or standard deviation. Some understand a moving average. I have been very effective using the tools of statistical inference to convince buyers to stock parts or change marketing strategies.

    I look at store sales after a push and have the ability to argue whether it was effective or not based on data. I can adopt critical values of statistics on the fly based on how important the decision is. It is clear that I am helping my clients understand how useful their data can be in guiding their decisions. There is so much information available in the data most business can collect now, but there are so few people that seem to be able to understand how to analyze it.

    Being a scientist at a business meeting feels like being a wolf in sheep’s clothing.