The running down process
The running down process refers to the fact that the comparability and variability of performance measures "erode over time", prompting the perpetual need for new performance measures in the same setting.[1]: 324 Meyer and Gupta connect five key factors to the running down process, including positive learning, perverse learning, selection, suppression, and external conditions.
The phenomenon of positive learning accounts for the fact that over time, the existence of specific performance measures can contribute to the improved performance of individuals, leading to a general decrease in the variability of results and thus less effective performance measures.[1]: 331 For example, in baseball, the diminished variability in batting averages in the 20th century is attributed to the improvement of players over time, but it has had the effect of devaluing batting averages as an effective performance measure in the industry.[1]: 338
Conversely, perverse learning results in stagnating performance levels within an organization because it leads individuals to focus on improving their outcome in performance measures, rather than their actual performance.[1]: 339 For example, teachers may often dedicate their efforts towards improving their students' test scores rather than their teaching style.[1]: 339 Similar to positive learning, perverse learning leads to a decreased variability in measured performance levels, but this performance improvement is artificial.[1]: 339
Selection explains that performance measures decrease in variability within organizations because individuals learn to select better individuals to evaluate.[1]: 340 For example, as the major league farm system has developed, teams have learned to select better batters and pitchers, contributing to the decrease in the variability of batting averages.[1]: 340
Suppression is explained by the fact that "organizations sometimes suppress persistent differences in performance".[1]: 341 For example, within the New York City school district, standardized testing scores vary greatly, and some administrators have advocated for a different reporting system that would make it much more difficult to differentiate performance levels between schools.[1]: 341
Finally, external factors can impact performance measures in the opposite direction of the running down process. For example, the turbulence of the commercial banking system in recent decades has served to disrupt the running down process of existing performance measures because the unpredictability of the industry makes it difficult for individuals to "learn" or "select" based on past factors and experiences.[1]: 343
Given that performance measures tend to erode over time, Meyer and Gupta call for new performance measures that evaluate the same properties but are not yet impacted by the running down process.[1]: 311 Ultimately, Meyer and Gupta state "The running down of existing measures and the appearance of new measures nearly orthogonal to existing ones yields a paradox of performance."[1]: 311
Orthogonal measures
When performance appraisal measures are run down, they typically need to be replaced by new measures. In the sciences, overlapping data is useful, in that it they can be used to confirm or disprove a given hypothesis. In management, however, overlapping measurements are considered redundant, rather than a useful indication of reliability.[1]: 346 By the same token, new measures that lie in direct opposition to existing measures of performance are not helpful. For instance, if a retail company uses units of shoes sold in a month as a metric, adding units of shoes remaining unsold after a month as a new metric is not helpful. Since the company can derive the same information and draw the same conclusions from both metrics, it is more efficient to use only one of the two measures. In the interest of generating useful data, new performance measures should be orthogonal to existing metrics.
Orthogonality, or non-redundancy, does not necessarily indicate null correlation. Consider a secretary's performance, which might be measured by number of breaks per hour and the time required to complete reports. The two measures are orthogonal because they do not overlap. However, it is possible that repeated evaluations could show a reliable association between a higher number of breaks per hour and less time required to complete reports.
The history of General Electric provides a clear example of developing orthogonal performance measures. When GE dismantled its conglomerate in the 1950s, its existing performance measures, which relied on centralized budgetary targets, needed to evolve to suit the newly decentralized company. The 1951 GE Measurement Project provided a template for the new performance measures, which were orthogonal to the old performance measures, as well as to each other. The new measures were "profitability, market position, productivity, product leadership, personnel development, employee attitudes, public responsibility", and balance between short-term and long-term goals.[1]: 348 Thirty years later, when the company was in dire straits, the performance measurements were functionally consolidated into ranked profitability and growth. With this strategy, GE annually swept away the bottom 10% of its performers in profitability and growth. Once GE regained financial and market stability, the performance evaluation metrics changed in response, allegedly expanding and taking on more humanistic values. GE illustrates two important notes about changing performance metrics. First, new performance measures are most useful when they are unrelated to each other and to existing measures. Second, performance measures tend towards elaboration during times of security and profitability, and likewise tend towards consolidation during times of urgency and strain.[1]: 348–50
Orthogonality has been shown in the history of many industries, particularly to reflect changing expectations. American hospitals used to measure success by patient outcome. In the early 1900s, however, a study showed such dismal results with patient outcomes that the study and its results were burned, and hospitals instead evaluated performance by the keeping of records and adherence to procedures. With time, societal expectations of low patient mortality have led to hospitals reinstating the patient outcomes as a measure of success.[1]: 345
Evolving technology have also forced the development of orthogonal measures. As early as 855, the success of texts was measured by print runs.[4] In 1942, The New York Times began publishing a list of best-selling books, which has been shown to influence the purchases of the majority of American book buyers.[5] The NYT Bestseller list is divided into section, including fiction, non-fiction, and children's literature. With the advent of e-book technology, the NYT added an orthogonal e-book section to the list.