The Wisdom of Crowds:
A better measure of performance
by Steve Montague
The viewpoints and brainpower of the many are almost always superior to the thinking and decision making of the few.
James Surowiecki, in The Wisdom of Crowds, cites the example of 700 people estimating the weight of an ox, and coming on average within one pound of the actual weight. He extols the virtues of various “markets” or crowds to successfully predict sporting events, set fair prices to assure product “clearance” and even to focus appropriate blame for the air shuttle disaster – with only very partial information. This thesis flies in the face of conventional wisdom, which suggests that only a few key people should make judgements – especially when matters are technical and complex.
To some extent we have a parallel situation in public performance planning and reporting where the tendency is to focus on a few key ideas and measures – without recognizing the need to have other indicators or measures to complement, confirm and contrast the perspective. Examples of overly selective measurement abound.
At the direct delivery level we have seen call centres and information services focus on speed – often to the detriment of service outcomes. In one case, managers initially believed increased usage of an information booth in a large national park was due to its popularity; unfortunately, most visitors to the booth were looking for directions. They were getting speedy but insufficient directions, going out, getting lost and then coming back to rejoin the line for more detailed information.
An early case of over-reliance on one performance measure was the use of benefit-cost analysis by the United States Tennessee Valley Authority. Ambitious engineers with an interest in building dams were able to manipulate the ratios so that they would almost always come out to at least the threshold level to be able to build. Naturally, several of the harder to cost environmental concerns got short shrift in this approach, and the region is still paying the environmental price.
At the government wide level we have seen singular indicators used as a kind of rallying cry for new initiatives. Two recent Canadian federal examples come to mind – the notion that we should double the number of exporters, and the idea that Canadians should become the most connected people in the world.
These single-minded slogan-measures shared a similar conceptual problem and each had specific technical problems. The conceptual problem can be summarized by asking, “Why”? Why do we want to double the number of Canadian exporters or be the most “connected” people in the world? What distinct need or gap was there in our situation that required these singular goals?
In the case of the exporters, it wasn’t at all clear that the raw number of exporters was a problem. In fact, given that first-time exporters often lost money in the pursuit of their new export sales, it might be counter- productive to encourage a large increase in a short period of time. Technically, there was also a big problem: The Canadian government didn’t have a good handle on the exact number of exporters that existed when the commitment was made, so it had a very difficult time figuring out when the number doubled.
As for the goal of “being the most connected people in the world,” what constitutes “connected”? Early efforts showed that, if you counted cable TV hook-ups, Canada was close to number one in connectedness. But that wasn’t quite what most people had in mind. Expensive efforts to define a “connectedness index” were undertaken. The lesson in these recent cases is that we need to think carefully before picking a single measure to represent the success of new initiatives.
Now we face the threat of further oversimplification with regard to our health care system and other public initiatives. We have heard much discussion of the need to reduce waiting lists for health care procedures. While this goal is laudable, it would be dangerous to set it in isolation. Just as with the liabilities noted in the case of a call centre focusing on speed over quality, a health care system which measures and, therefore, emphasizes speed over quality might not just be ineffective – it could be dangerous.
So what is the answer? Surely we aren’t advocating dozens of measures for every goal? The greatest concern expressed by our clients is the fear that we will build a measurement scheme that requires a burdensome bureaucracy of its own to administer. Well, we’re not advocating dozens of measures – but we are suggesting a good set of measures form a small crowd or cluster which achieves the following:
Complement: It is useful for measures or indicators to complement other measures. Complementary measures provide greater insight into a key concept. As an example, many groups will use a quantitative measure for items such as reach, take-up, usage or client satisfaction. A complementary measure to the quantitative total is often a breakdown of, for example, usage by key target groups. The “spread” or mix of users can tell you as much or more about the appeal or value of your service as the total number of users.
Confirm: A good measurement system allows for the confirmation that an indicator truly measures what you think it measures. An example of this occurred when we gathered complementary qualitative information on satisfaction. We asked the question “Why?” after getting people’s satisfaction rating on service. It turned out that people rating themselves as “somewhat satisfied” were often really not satisfied when one analyzed their qualitative comments. This allowed us to adjust our interpretation of satisfaction scores. Complementary measures might have helped pollsters to more accurately predict the outcome of the 2004 Canadian federal election. Rather than tallying the numbers from the simple response to, “If the election was held today who would you vote for?” analysts with the benefit of 20/20 hindsight identified a number of measures which didn’t confirm the straight polling data.
Contrast: The contrasting measure is the balancing measure. The most well known example of this phenomenon is the emergence of the Balanced Scorecard over the last decade. The Balanced Scorecard evolved as a response to narrowness in measurement perceived by Harvard professor Robert Kaplan
and his colleague David Norton. In their early Harvard Business Review articles and their first Balanced Scorecard book, they suggested that companies in the early 1990s focused too strongly on financial indicators as performance measures. This was seen to be akin to “driving by looking out the rear view mirror”.
A good Balanced Scorecard has an appropriate mix of outcome measures (“lagging indicators”) and measures of the drivers of future performance (“leading indicators”). Their argument for balance goes like this: Lagging indicators without leading indicators do not communicate how the outcomes are to be achieved, nor do they provide an early indication about whether an organization’s strategy is being implemented successfully. Conversely, leading indicators without lagging indicators may point to the achievement of short-term operational improvements but will fail to reveal whether these operational improvements have been translated into meaningful medium term outcomes and, eventually, to desired final outcomes.
In essence, a balanced set of measures can help organizations to optimize rather than maximize key aspects of performance. In this way the Balanced Scorecard, like the “results logic” approach familiar to many public sector analysts, is not merely a collection of leading and lagging indicators. It is the translation of the organization’s strategy into a linked set of measures that define both the long-term strategic objectives, as well as the mechanisms for achieving those objectives.
In summary, a good measurement system will recognize that single indicators representing the ideas and concepts of performance are not just annoying – they can be downright misleading and dangerous. One way to combat this is to recognize the wisdom of crowds or clusters. This applies to clusters of indicators which complement, confirm and contrast, clusters of tests that independently replicate findings, and crowds of different stakeholders.
Indicators should be drawn from the behaviours of different groups – different segments of users (including nonusers), different geographic regions, different levels, different cultures.
The real conclusion – consistent with Surowiecki’s message – is that the best measurement “system”, like the best decision-making system, is one that involves diverse, freely exercised feedback. Our public measurement frameworks need to preserve this value.
Steve Montague is a founding partner of Performance Management Network Inc. and president of the Performance and Planning Exchange (PPX) – a not-for-profit organization with members from the public sector, private sector and academia, dedicated to improving results and performance management through the exchange of information and ideas (see www.ppx.ca). Steve provides consulting and education services to clients in the federal, provincial, regional and international government communities (steve.montague@pmn.net).