« More MSM bias | Main | L.B.: The missing children »

Mar 22, 2007

Gaming the stats

Don't know if anyone else took my recent advice about The Wire, but I did.

Watching the early seasons again has reminded me that "gaming the stats" has been an important part of the show all along. From the series' very first episode, we see homicide detectives dancing to avoid responsibility for difficult cases ("stone-cold whodunits") as their various bosses battle to ensure that any open cases are someone else's problem. Throughout the show, there's far more talk of "clearance rates" than there ever is of justice or public safety or even just catching the bad guys.

RawlsThe police on The Wire are driven by their clearance rates, roughly meaning the percentage of homicide cases they solve. Any police department in any city ought to be concerned about such a percentage. It's a reasonable and potentially useful measure for the effectiveness of the department. But in the Baltimore of The Wire, this statistic no longer serves as a mere measurement -- it has become the engine that drives every aspect of the detectives' work. It no longer measures what they do, it is what they do.

Thus the various department heads, lieutenants, sergeants and individual officers have all figured out dozens of ways to massage the statistics -- all of which involves a great deal of time and energy spent on things other than actually trying to solve cases.

I don't know enough about big-city police departments or about the city of Baltimore to know whether or not this portrayal is accurate, although the series' creator, David Simon, has a reputation for doing his homework. But whether or not this is really how the Baltimore police operate, it rings true to me because I've seen the same dynamic at work in many other institutions -- in tiny nonprofits, national denominations, and Fortune 500 companies.

It happens everywhere. A perfectly useful measurement gradually becomes more important that it has any right to be and soon everyone's life is shaped by the slightest variations in that measurement. People quickly figure out how to improve their "score" in dozens of ways that do not improve the performance or outcome that score was originally designed to measure and the institution eventually takes on the character of those whose power and influence rises because they are particularly skilled at gaming the stats.

So I'm wondering if you've seen the same thing. In comments below, tell us about the runaway metrics in your particular profession or institution and the pressures and techniques for gaming the stats in your line of work.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c582a53ef00e55045cf638833

Listed below are links to weblogs that reference Gaming the stats:

Comments

Ah, what the tech industry refers to as "metrics". What began as a useful supplement to be able to figure out what your sysadmins have been doing turns into a monster.

Case in point: Nameless Silicon Valley company gets bought by east coast management. They want to know what their IT department is doing, so they implement a trouble ticket tracking system. Soon people can call up, register trouble tickets, and get them dealt with. That's great, right?

Within several months, your job performance is registered _entirely_ on the number of trouble tickets you've resolved. This might sound like a good idea until you realize that IT people do a lot more than jsut fix things when they're broken. They have long-term projects that will make things better and prevent many tickets from being opened down the line - but there's no way to track the number of potential tickets that have been closed. Eventually some disgruntled tech says out loud what others have figured out:

"Why should I put time and effort into fixing this now, when it's not a problem the users see? I already know what the problem is, and as long as I don't fix it now, when it DOES become a problem, I can close a bunch of tickets by making this fix."

Knowing nods abound, and everyone goes back to doing the same thing - cherrypicking the easiest tickets out of a growing queue as the infrastructure that ran so smoothly before the new management slowly crumbles away.

And when things do inevitably break on a large scale, the same management that caused the problem will require the techs to cancel vacations, work 14 hour days, skip the holiday season, until it's fixed. The good techs leave for other jobs, other careers, while the desperate, the not-quite-competent, and the politically well-connected remain behind to train the next generation. And the first thing they're taught is that you don't touch it unless it's got a ticket number. And the infrastructure continues to crumble.

The most obvious large-scale case of this is standardized testing of kids in schools. What was once a measurement to help teachers see where students were in the learning process has become the one measurement to which everything else must submit. Lots of dollars ride on the results, so of course schools focus more and more on getting their kids to take these tests well. There's a whole literature on this.

The college admissions tests (SAT, PSAT, GRE, LSAT, etc.) have suffered a similar fate. There are entire industries build around gaming these tests. Fortunately, at the college and grad school level, there seems to be some re-thinking of the weight put on these numbers.

In comments below, tell us about the runaway metrics in your particular profession or institution and the pressures and techniques for gaming the stats in your line of work.

Not unique to my profession or institution, but probably the most important fungeable metric out there:

Stock price.

People get laid off by the hundreds to game stock prices, and, conversely, a stock price that comes in a penny below projections can seriously harm a company. Companies aren't about producing a good products for their customers; they're about producing a good return for their shareholders -- and if that means, say, covering up reports that their product is dangerous rather than admitting the problem, taking the stock price hit, and fixing things, then many companies go with the stock-price-safe cover-up.

I've never had a proper job, but I've been a student for a very long time. I had school teachers from elementary on up (teachers who were generally good stop the regular curriculum for days to teach us how to take a standardized test. This would go on every year, and consist almost entirely of skills unique to this sort of test (in the early years, correct bubble-filling and making sure you have the right kind of pencil; in later years, optimal strategic random guessing). There's pumping up your GPA by taking easy courses, of course. The disturbing thing for me is not that students do this, but that so many guidance counselors and faculty advisors actively encourage students to do this at least some of the time. If you look at schools to provide education, the difference between a 3.5 and a 3.7 isn't worth a semester studying something that doesn't challenge you. If you look to schools to provide career prospect sorting, and skills in managing beauracracy, then it makes perfect sense.

I do have to say, in the Peace Corps I saw a different metrics problem; being out on your own, attempting various things, then once every three months having to make it look like your projects fit the designated goals. My reports always looked quite bad, because we were supposed to match our work with our objectives, and they didn't have one for "finds pre-existing community services that can solve a problem and connect them with the institution you're working for" and that was about half of what I did. I also did hardly any tutoring (on account of finding the Non-Formal Education professionals, and retired teachers, and various neighbors eager to help who did it better than I did), and that meant I had hardly anything towards several of the goals.

Fortunately, it turned out I was only doing everything wrong on paper, and they rewrote the objectives to include all the stuff I, and some of my fellow volunteers did that was useful but not included in the original goals and objectives. So it's a minor problem if you're working for someone as mellow as the Peace Corps. But it has the potential to be big.

Ever notice how the serving size of consumable consumer goods has been going down steadily, despite no decrease in price? I.e. a bar of hotel soap is now an almost translucent sliver, a juice box that you might put in a kids' lunch is barely 3 swallows large whereas a few years ago it was an honest-to-goodness pint (now it's "Kid Size" despite costing the exact same amount), etc.

This is not accidental. This is the product of a semi-intentional conspiracy. It was started by Walmart but has infected most consumer goods manufacturing. It goes like this . . .

1.) Walmart (and thus, every other retailer), demands a 5% reduction in costs by all suppliers, each year, every year (I'm not sure it's 5% exactly but it's something like that) . . .

2.) ...But cutting that much, each year, every year, is difficult. Labor is far and away the most expensive input into any manufactured good, but most consumer goods were long ago outsourced to wherever they're cheapest, laborwise, to produce. You COULD cut the wages of managers and CEOs, but how likely is that to happen, really now?

3.) So instead you foist the responsibility for cutting costs onto the shoulders of the juniorest of junior executives. Someone really freshly minted off the MBA mill. Someone with a lot to lose. Give them all the responsibility and almost none of the authority needed to do the job. Their assignment is to cut costs 5% for this year. Their reward is a promotion, meaning they won't have to figure out how to do the same thing next year; THAT job will be foisted onto the NEXT recent grad to join the firm. (Thus forming a hot-potato chain of annually increasing desperation).

4.) The result? Steadily diminishing quality and quantity of consumer good sizes even as prices remain constant or increase. Why? Quite simple: Because it IS NOT POSSIBLE to make the same widget each year, every year, for less and less cost. It's only even possible to enforce that temporarily is to corner the market and pass the buck; Walmart makes it smaller companies' problem, and within those companies, higher-ups make it their subordinates' problem.

We, the consumers, get tinier and tinier bottles of hotel shampoo, steadily coarser and smaller hand towels, and T-shirts with one fewer thread per square inch each year.

As a professor---student evaluation numbers. I'm not exactly sure how to "game" them, but they certainly are something folks are concerned about, and people change their teaching strategies (not necessarily for the better!) because of concern about these numbers. This is not to say that they can't be a useful thing to consider, but I would prefer my teaching to be evaluated in multiple ways, not just by noting all the students gave me 2s instead of 1s (where 1 is good), or whatever.

The software industry is rotten with useless metrics. See Joel Spolsky explain why, hilariously.

Lab work. We got a new manager who looked at our metrics, and she was surprised and disappointed to note our overtime percentages had gone down from January to the middle of the year. What we never could explain to her satisfaction was that most of the early overtime was due to one of the analysts' being in the middle of house-shopping at the time and wanting more money in the bank. The analyst managed to stretch out four hours of actual work into a twelve hour day. When our immediate supervisor finally caught on to what was happening, she started measuring the actual number of tests performed (knowing how long each one took from personal experience). The analyst in question skipped off to a new job as fast as she could. Our overtime dropped but our performance per analyst went up, and still the new manager was upset that we weren't doing as much overtime, because *her* boss insisted on higher overtime numbers.

The corporate world is dumb.

khadjair has my industry nailed. I'm lucky enough to work for one of the local support departments whose manager has us measure our progress in terms of how often customers call our director to complain about us (the smaller the number, the better), but other, less savvy departments will e-mail a tentative fix, close the problem ticket, and reopen the ticket when it turns out that the "solution" was copied-and-pasted from the troubleshooting page for an entirely different problem. Rinse and repeat. That's bad for the customers and (in terms of the total amount of work done) bad for the troubleshooters as well.

I am absolutely sure I've seen a Dilbert wherein the Evil Pointy-Haired Boss says "from now on you're all going to be measured by the number of bugs you fix." The underlings all smirk at each other behind his back, because they have instantly figured out that job security lies in deliberately inserting easy-to-fix bugs into the code.

Needless to say, I've also seen instances of this in real life. (Scott Adams resisted retiring from the software rat race for a long time after Dilbert had made him rich enough to do so because he feared running short of material.)

I've noticed during Purgegate that one side says the fired USAs weren't concentrating on stated objectives (e.g. not going after enough immigration cases) while the other says in their support that they all have high numbers of cases cleared. This statistic can be misleading in the same way as the IT-tickets example. I'm not supporting the Bushies on this one, merely pointing out that a high clearance rate or a high conviction rate by itself doesn't mean a prosecutor is doing a good job (and better evidence abounds that at least most of the firings were politically motivated).

@J,

shrinking food portions is arguably a major GOOD in America... part of the reason for the obesity epidemic here is the enormous portions (and part of it is that the cheapest foods are the worst for you, and part is the car-dependency that removes exercise, and part is the lousy health care, etc... but it does matter). o'course, as you say, the whole thing is long term not sustainable.

@leahwrenn

The way you game stats with students is by grading easier (the research says this works especially if you're a woman - they really get punished for giving low marks), and by not challenging them with unpopular stances... All things that often make for BETTER profs.


My own example is a weird one in that it involves mission drift, but isn't directly tied to measurement. We have Internal Review Boards that are supposed to supervise the ethics of experiments... you want to do research on humans you have to clear it past one first. It's a good idea in principle, but the almost universal phenomenon with them is that they become more and more intrusive with time, to the point that zero risk research (having people fill out a survey about how much they like to drink milk or whatever) becomes bogged in red tape, and endless corrections to the consent and debriefing process, etc. It can literally take months to get permission to run a simple study like that...

Seriously, you start watching shows like survivor with envy, because if we tried to do a tenth of that stuff, we'd get slammed forever (and forget offering anyone a million dollars even if we could afford it. We would be told that so much money could effectively coerce people to participate, which is unethical).

Part of the motivation for their mission creep seems to be CYA anxiety from universities that don't want to be sued if anything goes wrong, but part of it just seems to be people with a job description who feel that they have to do SOMETHING.

To talk about a less corporate context, in working for the church you find out the importance of ATTENDANCE NUMBERS - that is, how many people show up at the church service each week. This can be a reliable guide to the health of your church, i.e. if the church is doing things well, people will show up. However, attendance soon becomes the focus of all hopes and fears for church leaders, and ministers often turn to desperate and inappropriate means to get bums in those pews - hence the crazy consumer world of modern evangelicalism.

I no longer actively teach, but my buddies who do are just going through hell thanks to the University's new policy of calculating teacher performance and pay. I won't even pretend I fully understand it. All I know is that the final figure - they call it "študentohodina" (studenthour) - is calculated based on the number of hours taught per week and the number of students attending. According to this system, teaching an introductory class in, say, linguistics is worth much more than supervising a PhD thesis. Not to mention that while an epigraphy class takes an enormous amount of time to prepare and to hold (ever tried to read a medieval Latin text with 20 people), you get less money than someone who's idea of holding a lecture is reading selected chapters from his book.
And then there are points for publications: X for articles in domestic journals, X+20% for articles in international journals, Y for monographs, Y+20% for monographs published abroad and, of course, Z for citations in domestic publications and Z+30% for citations in international publications. Who cares that one professor of mathematics got over 100 citations just because her work contained so many flaws and outright BS that everyone cites her as an example of what one should avoid at all cost. She's got the points and hence the money.

I used to work in debt management, and just before I left the supervisors started going numbers-mad. Every team had to push through so many cases minimum per week to the next stage: collecting information from client, collecting information from creditors, sending out offers and preparing the case for transition to the long-term department. Our team leader decided this meant spending all our working time hyper-focused on pushing cases through, sometimes going as far as letting all incoming calls (mostly from clients in crisis) go through to voicemail instead of dealing with them.

And then there were those clients who sat on our books for a whole fortnight and failed to advance. It was decreed that if they missed three calls from us, they were to be cut adrift, because the longer they stayed on our books without advancing, the worse they looked. So if we called when they were out and then didn't answer the phone when they called back, we cut them loose.

I was relieved in some ways to be laid off.

@ J -- i don't know that this is always true.

first of all, one thing i've noticed is that food portions in cheap and crappy restaurants always seem to go up. you get a HUGE amount of food, but the quality goes down every year. look especially at places like Olive Garden which love to have specials where you can have all the free bread, salad, soup, drink refills, bottomless coffee, etc. you want. these are the cheapest items that Olive Garden provides. and if you fill up on that because it's "free", you'll be less likely to notice how the quality/quantity/substance of the entrees goes down, or the prices go up.

second of all, i feel that for more "durable" goods these days, especially housewares, bigger is automatically better. i don't know whether it's to make fatter people feel better (i've especially noticed this with food-related housewares), to get people consuming more, or a justification for price increases, or what, but as a small person with a small apartment and not much storage space, i've noticed a huge jump in the size of glasses, cutlery, plates, and even dining furniture over the past few years. the soup spoons don't fit in my mouth anymore. the wineglasses make me feel like an alcoholic dwarf. my mother bought new side chairs for her dining room, and i swear i could comfortably share one of them with a significantly larger person.

what i do agree with you about is quantity and packaging size. also quality. a shrink wrapped package of t-shirts used to cost $3.99 for a 3-pack, now it's $2.99 for just one. and that one falls apart, like, immediately. i've also seen companies significantly update and swankify their packaging in order to make you feel like you're getting a great deal on much fancier product. i noticed this today shopping for orange juice at my corner store. you can get a regular quart (32 oz) in a cardboard carton for $3, or a really fancy 30-oz clear plastic bottle for $4.

Former special ed teacher here. Do I really need to say more?

Unless the law has changed, there may actually be slightly more flexibility in special ed than regular, because no-one expects our kids to ace the tests. But there is a distinct feeling that you're being evaluated on the strength of your paperwork and not your teaching. There's also pressure to teach the basics, and only the basics, which makes a certain amount of sense; if you can't get Jeremy to read by the time he graduates, his options are badly limited. Problem is, special ed students get bored too, and sounding out three letter words every day is boring. (In hypothetical Jeremy's case, sounding out three letter words may be boring and *difficult,* which is even worse.)

Students who are bored generally look around for a more interesting activity. "Seeing if you can drive your teacher to drug use," is always a favorite.

Me, I think that the current curriculum needs a lot of enrichment, at all achievement levels. Too often, enrichment is only available for the gifted kids, and then only if the department hasn't figured out a way to divert that money to something "more urgent."

Izunya (who isn't sure if that rantlet was helpful or not)

Not my profession, as will become obvious, but a very poignant example, if it's a true story.

I had read somewhere (although I can't remember the source, so this is anecdotal at best), during the Soviet Era, a factories were reimbursed only proportionate to the product they produced (very reasonable). A shoe factory, for example, was paid based on the total number of shoes it produced. However, because of the way the system was set up, the factory was rationed a very specific amount of leather and other raw materials per month. And so, displaying the capitalism that exists in most every person, the foreman ordered that every pair of shoes made be size 1. Most shoes produced per materials allowed. Sure, nobody in the region had shoes that fit, but the foreman and the employees were paid extremely well.

oh, i should also come back to the topic at hand. one "hack" that i know of is the relation of film grosses and profit margins to the price of a movie ticket. every time the numbers fall or even just don't look impressive anymore, they bump up the price of a ticket by $1 or so, thus engineering the record-breaking opening weekends of relatively mediocre films. this pretty much explains why a film like Spiderman 2 can be in the Top 10 box office grosses of all time, even though far fewer people went to see that in a theater than even the most forgettable and tepidly received film of the 30's, a time when EVERYONE was going to the movies all the time.

i've also seen producers of low budget films "game the stats" a bit with their budgets, the unions, etc. by hiring "Production Assistants", or even "Interns" who can be paid next to nothing and don't need to be unionized. except actually their job description is far more advanced than what PAs and Interns usually do, generally pretty much matching union job titles like "Assistant Camera Operator" or "Set Decorator". but as long as you don't shove that fact in the union's face, and don't tell the greenhorns they deserve to be paid much more (and as long as the greenhorn doesn't totally fuck something up due to lack of experience), it's a win-win situation.

Peter, i've heard that during the 1930's, there was practically no overall growth to the number of available shoes produced. shoe production stagnated, and shoes wore out much faster than they could reasonably be replaced. thus, pretty much NOBODY in all of the USSR (except of course for party bigwigs) had a newly produced pair of shoes, for an entire decade. so your anecdote doesn't surprise me, and in fact kind of works -- with intensive hand-me-down and improvisation schemes, the only people who would absolutely require new shoes would be the youngest children, with the smallest feet.

i got the above factoid, by the way, from Everyday Stalinism by Sheila Fitzpatrick, whose work i highly recommend, if you want an incredibly unbiased (and in fact sometimes slightly left-slanted) look at the Soviet era.

Rehab clinics often press physical/occupational/speech therapists to either hold on to patients longer or discharge patients earlier, in order to "make our numbers". There's also a problem with therapists being pressured to take patients who aren't suitable candidates for rehab (this includes patients who don't actually need rehab, and those on the other end of the scale who aren't in good enough shape to benefit from 3 hours a day of hard work)--part of this is also numbers-driven, and part is "Dr. X wants us to admit this patient." Needless to say, none of this is in the patients' best interest, and it's also in direct conflict with the therapists' ethics code.

The children's home I used to volunteer at required regular staff reports, describing what progress the child had made. While it didn't require that the social workers go through a checklist (that was a separate form for the houseparents), there were certain signs that generally went over well on the report. These included things like enrollment in school, acceptable-to-good grades, participation in the "livelihood skills program" (a totally ineffective program popular with social service on the theory that all you need for a career is a suitable handcraft), ability to communicate problems, and weight gain (most of the kids were malnourished and underweight). Nearly all of these criteria were positive signs (even though making a crochet purse wasn't exactly great livelihood training, it did make a nice hobby). Some of these were also easy stock phrases to spit out.

I caught on while helping one of the social workers learn formatting features on the computer. She was doing a report, in which she'd just put "She has gained weight during her stay." The particular child she reported on had (unusually) actually arrived overweight, and was being encouraged by the staff to eat healthier and exercise more. I pointed this out, and the social worker looked at me rather surprised. Then she asked me, "Should I put that she lost weight, then?"

I managed to persuade her to ask the nutritionist/cook (and extra houseparent/supply manager), who actually knew how much the children weighed and in what direction they were trending. After that, though, I took certain phrases in the children's reports with a grain of salt.

In higher ed, the numbers we're forced to game for are US News rankings and similar BS rankings put out by other organizations.

School: A school's reputation around here tends to depend on how many kids they fail, which is usually a very easy system to game. I could go into detail, but it's both obvious and depressing. Not 100% save, though: When I was in technical school, the school board had a bunch of kittens for fear that the school's repuation would suffer, because every student could answer every question in a test that was clearly harder than the ones in previous years. The teachers got into all kinds of hot water over it.

Another measurement to game, though on an individual base: Psychometrics. With even a vague grounding in pop psychology and some knowledge of your audience, it's easy to assemble the correct persona on the fly. Even if whatever the test truly measures is hard to fake [e.g. artistic ability or good taste] that's not necessarily what the test is believed to be measuring, to it can still be gambled.

I work in Vendor Management, overseeing contract performance by on-site vendors. With one vendor, the particular work it does for each different internal customer is so varied that there's no uniform performance factor you can measure other than general customer satisfaction. So every month we run a survey of the in-house customers, asking how well the vendor is performing its assigned tasks. Our own goal is to make sure that the vendor is making the customers happy.

So what slowly happens over the months? The definition of poor performance gets narrower and narrower, so our group can show how well we're enforcing the contract and how happy the customers are. But more and more, poor performance problems are being pushed beyond the scope of our survey, they don't get reported, customer sat in reality goes way, way down and the vendor is allowed to skate because OUR OWN METRICS say it's doing a great job.

In essense, our metrics are designed to encourage us (the vendor managers) to do a bad job.

This sort of a thing is a major problem here in the UK, where one of the big complaints against the government is that they overuse targets. When they came to power the public services were a mess - so they started putting a lot more money in, provided certain targets were met. Managers caught on and started to game the stats - only because this was major public services the media caught on. So the government changed the targets, making them more complicated - harder to game, but it needed more paperwork as well. And it just took off - to the stage at which no one can actually meet the targets honestly becuase there's just too many of them. Nobody has any idea what to do - the Opposition talks a lot, but they've got no alternative.

Just echoing "bulbul" on academic metrics, with slight variations. None of these is significant enough to warp the whole job, but each distorts some aspects of it:

1) "Research Assessment Exercises" according to some fixed formula - arising from Thatcherite Britain, IIRC - with every individual, then every department, is given a number from 1 to 5, and resources allocated accordingly. The original formula we were given - apparently drawn up by scientists - counted a couple of articles (which they [scientists] churn out by the dozens) as worth an entire book (which they don't bother with, except for textbooks, which are considered useless as research). It is not long before everyone learns to "thin-slice" their research, turn what would have been an important monograph into multiple articles, what might have been one fat, complex, interesting article into three or four discrete, narrow, boring ones. (In the UK, this also led to universities [or "cost centres" within Uni's] trying to strip-mine other institutions, hiring away their few top-rated researchers or [worse, IMHO] taking over whole schools and firing everyone except those couple of stars who might make their overall averages rise.)

2) Emphasis on retention/graduation rates, at the expense of quality control. I was involved with one Master's program that attempted to keep standards high, even if that meant failing those who didn't measure up. It was infuriating, when it came to "resource allocation," to have our r/g rates compared unfavorably to those of another program that, so far as we could tell, simply shoved through everyone who got into it.

The previous comment was by me - don't know why it came through anonymously.

I forgot to endorse, in the context of RAE's (above), what an earlier commentator said about the disutility of "citation indexes," in which the value of one's publications is measured in part by how often it is "cited" in a select set of journals searched. One major problem simply overlooked by those who proposed this metric: there are MANY MANY MANY more journals that refer to American and European history than other fields, such as African or Southeast Asian. Anyone in those fields, no matter how good, is simply buggered as far as such indexes are concerned.

I once knew a programmer who responded to a tool for measuring lines-of-code-changed by writing a script that would insert meaningless changes (spaces at the end of comments) so that he modified exactly 1337 lines of code per day.

It was a very effective demonstration of the uselessness of the lines-of-code metric.

tell us about the runaway metrics in your particular profession or institution and the pressures and techniques for gaming the stats in your line of work.

billable hours: work longer not smarter.

look especially at places like Olive Garden which love to have specials where you can have all the free bread, salad, soup, drink refills, bottomless coffee, etc. you want. these are the cheapest items that Olive Garden provides. and if you fill up on that because it's "free", you'll be less likely to notice how the quality/quantity/substance of the entrees goes down, or the prices go up.

In a previous life, I actually worked at an Olive Garden for a minute. As ridiculous as those giganto portions are, the numbers are being worked on the other side, too. Servers are required to measure the amount of salad dressing, amount of salad, and number of breadsticks per person. Extra salad dresing was a serious offense. The managers are expected to account for every crumb of crummy food served.

* Similar to khadjair's example upthread, call center employees are expected to complete as many calls in a given period as possible. Calls lasting longer than a certain duration are looked upon with suspicion (gods forbid someone need a little more explanation as to how finance charges work.)

* CA wine producers reportedly game their wines toward Robert Parker's personal taste, in hopes of receiving one of his value high scores.

* A slightly less direct example, but one close to my heart, is marketing/sales and fundraising departments being permitted to monopolize the short-term goals an organization to the extent that the very goals of that organization are smooshed out of the way. The primary purpose and value of a museum is not to be a cool venue for weddings/receptions/company parties. The most important thing about a membership organization is not the ass-kissing of the funders, it's the members and the programs that serve those members needs.

Megapixels in digital cameras, and Gigahertz in CPU frequencies. Both measures are marketed as the metric of quality for the device (picture quality for the camera, speed for the CPU). Thus, manufacturers try to cram as many pixels and as many clock cycles into their devices as they can, to the detriment of everything else.

In reality, a 10Mp camera actually performs much worse than a 5Mp camera with the same sized sensor, due to the excessive noise. Additionally, there's not really much of a difference between 8Mp and 10Mp, because the pixel count increases as the square of the length/width of the sensor (seeing as sensors are rectangular). Thus, if you increase your pixel count from 10Mp to 11Mp, the actual difference in picture quality would be barely noticeable, but you'll get to put a nice bullet point on your marketing brochure.

Similarly, there are many factors that affect CPU speed, and the GHz count is just one of them. Intel jumped on the bandwagon by releasing their Celeron processors -- which had an extremely high clock speed, but no cache, which made them slower than molasses in January -- but also a lot cheaper. They sold like hotcakes, because all those GHz must mean they're faster, right ?

* Similar to khadjair's example upthread, call center employees are expected to complete as many calls in a given period as possible. Calls lasting longer than a certain duration are looked upon with suspicion (gods forbid someone need a little more explanation as to how finance charges work.)

Well, that would certainly explain my recent experiences with CapitalOne's call center...

The question I have is, are there any counter-examples? Any examples of ungameable stats or useful metrics in tracking *human performance*, specifically.

Susan mentions Parker points/reviews in wine...

As with any consumer-information tool, they are bound to be abused - my favorite story:

Customer is a big fan of a particular moderately-priced wine which consistently receives ratings in the low-to-mid-90's on Robert Parker's 100-point scale. He buys the stuff by the case, and we, in the retail shop, are happy to help him. This goes on for several years, across several vintages. Comes one new vintage, customer is given a taste, he loves wine, he takes five cases on the spot... a couple of weeks later, Parker's review comes out .. 89 points. Customer returns the five cases, minus the half-dozen or so bottles he'd already enjoyed, explaining, "I can't serve this to my friends! It only got an 89!" "But you loved the wine, right?" "Yeah, but it only got an 89!"

What can you do?

The calls per time might explain my experience with the USPS call center. A woman there actually hung up on me, as far as I could tell because I was asking questions she didn't know the answer to.

Dr. Science, Any examples of ungameable stats or useful metrics?

What makes a nongamable stat or useful metric?

I would say, first of all, that it measures what it is intended to measure. Too often you want to know A, but measure B, because A is hard to measure and you know (hope, guess, assume) that there's a correlation between A and B. Possible sources or error: 1. Wrong assumption about the correlation, 2. The act of measuring changes the correlation, 3. You have not defined A.

Take lines of code written. Counting them tells you a lot about lines of code written. It won't tell you when your software is ready to ship, because you do not know how many lines of code there need to be written before that (no one knows). It won't tell you which of your programmers do good work, because you have no idea how to define "good work". You just feel it might correlate to lines of code written, or number of bugs fixed.

One measure that I feel is hard to gamble: Polling people's satisfaction with (whatever). You can bribe, trick or coerce them to lie, but those manipulations are outside the system - not gambling it.

Another: Getting people to speed less. You can measure how fast they drive, so you know how much they speed. If you want to measure safe driving, you do not measure speed, but frequency and severity of accidents.

You can stray a little bit from the "Measure A to know A" and actually measure performance if you have a clearly defined and measurable goal. Say you want to enforce driving safety by putting more highway patrol on the road. Then you count accidents (your measure of safe driving) "before" and "after" to measure the performance of the highway patrol. What you do not do is count tickets issued.

inge:
2. The act of measuring changes the correlation

Even more, such a system also inevitably & logically rewards people who can break the correlation, because they can target their efforts more accurately to B (the metric) without being distracted by A (the ostensible goal).

I'm a British public-sector statistician, meaning that I see the results of gaming the system all the time and can normally easilly track things back from there to what is actually going on. But the main problem with targets and gaming is that the targets were normally chosen not by statisticians or even economists but by some twonk in policy who makes the performance target something about the worst 10% of cases and improving them relative to some arbitrary baseline (we calculated that the simplest way for the Department of Health to meet one of its targets was to bomb Westminster and Kensington and Chelsea (and how those two ended up in the Neighbourhood Renewal Fund is beyond me)).

But my most recent example of target setting comes from my most recent business meeting. Each group in the company has its profit and loss figures recorded and part of our bonusses (if any) are based on that profit. In the company, there are five divisions - of which four do work for clients (and hence make money) and the fifth is Corporate - which is pure expenses against the rest of the company (IT, accountancy, etc.). Of the five divisions, only two are currently in profit - one marginally and Corporate which makes more than twice the total profit of the company combined and does it by billing the divisions that do work for actual clients. Guess which division sets the targets and bonusses...

The one motto that needs framing and putting over the desk of everyone responsible for setting targets is:

Measure what you value or you end up valuing what you measure

Hooray, another Tech Supporter in the thread. My experience matches with Lucia's and khadjair's.

Like khadjair, we have a trouble ticket system here. It's the only thing the boss really looks at. Fortunately(?), I'm the "Technical Support Manager", by virtue of being the only person in the whole company doing tech support. Because of those two facts, I get to game the system my own way.

All the biggest support cases come to my inbox either from our partners or get passed to me via the sales reps who get complaints from their customers, and not by the ticket system at all. However, the boss hears about the good work I'm doing directly from the sales reps and our partner manager. Sometimes, he'll request a status update on one of these big cases, and I tell him what's up. He doesn't actually care about any of the cases in the ticket system. He just wants to see the "open cases" number go down. They don't, because I literally spend all my time on the other cases. But, despite that, I'm clearly doing something, and if the boss asks around he gets a lot of stories about how I saved some big sale by working a guy through an installation problem or fixed some issue they were having.

Granted, that's not a great work ethic. Actually, it makes me ill leaving all those poor suckers in the ticket system out to hang. But because my boss doesn't want to hire a second person to cover the cases the right way, and because I have limited time and also bear the responsibilities of a Product Manager on top of doing first- and second-line tech support, this is the only way to get anything done at all.

For the record, I kind of hate my job.

As for Lucia's story, a similar thing happened here. Our software, without being too specific since I've just blasted the shit out of my company and its CEO, works across the internet and uses a proprietary port for said communication. This means that often the customer has to open this port on their firewall to get the software working. You will not find the port number documented anywhere. Not in the manuals, not on the website, nowhere. Not in the knowledge base, because we don't have one (not my fault; the CEO said, "Prepare some documents for the knowledge base, and send them to me so I can upload them to the website," so I did, and the knowledge base was never spoken of again). Nowhere.

Turns out, our software used to be owned by a different company. The support team there lived under a stricter ticket system than this one. They intentionally left that port number out of the manuals just so they had a large number of really easy cases that they could solve in less than 30 seconds, to keep their number of closed tickets up and their time per ticket down.

I have a doozy of stat gaming. I taught for several years in urban public education. Among the stat gaming:
Teachers were expected to pass most of their students. It was a mark of your effectiveness as a teacher. Since teachers are the main evaluator of student performance, the potential for how to game the system was clear, if you were prepared to give up standards.

The most convoluted method though involved enrollment and standardized testing. Schools in my state receive funds based on their enrollment, not their attendance. So if students are continuously skipping school or have effectively dropped out, administrators keep them on the rolls in any way to keep funding high. Occasionally they would delay putting through an official transfer of a student to a new school to help keep the numbers up. However, school certification was linked to the percentage of students scoring at or above state norms. So when standardized testing rolled around, dozens of students with sporadic attendance to begin with were swept from the rolls to reduce the school's official population for the purpose of the test and the percentage of students passing. After the test was over, many of the students would be re-enstated. If by sheer coincidence one of the truant-dropped students showed up during the test week, their re-enstatement would face more than the usual number of bureaucratic hurdles (mostly kids would sit in the attendance office until testing was done). Also troublesome students would be expelled during test week (and re-enstated afterward). I heard, but never saw or experienced, that some schools would even expel low performing students for no other reason for test week. My schools never did that, but it's a clear progression from what I did see.

One of the few standardized tests that I've heard does work relatively well is the LSAT. No because of any special virtue, but because of what it's meant to measure. It is specifically designed and intended to measure aptitude for first-year law school. Nothing bigger, or broader than than. I think that one of the problems with most standardized tests, and possibly most metrics, is how they try to measure broad questions like "Are they working?" "Are they learning?" "Are they smart?" by testing narrow criteria.

I'm lucky. Corporate wants to measure stupid stuff like billable hours and sales to the client (which is hard on a long-term contract with little high-level client "facing" (blech!). But our local managers are all about the work we get done. Are the issues reported by our clients (and verified before they're counted) going down? Does the over-night Batch process work without errors? Are programs promoted from Development to Test without errors?

The things we're measured on actually improve the project. Yay!

Anyone hear about how Chinese authorities, looking for a new college entrance exam, scouted around and eventually settled on the American GRE. Yes, you read that right: Chinese high school students will be taking a version of the Graduate Record Exam in order to get into an undergraduate college.

Yes, you read that right: Chinese high school students will be taking a version of the Graduate Record Exam in order to get into an undergraduate college.

That makes no sense to American readers as college studies in the US are general studies. If Chinese universities go the other way - you study only your major subject - then they expect that a students general education is finished before they're admitted. British universities work that way, for one.

I didn't think the GRE was any harder than the SAT. Taking it seemed to me to be an utterly useless waste of my time. The friend whose home I am currently visiting says she thought the GRE was easier than the SAT.

You have to remember that that imp got a full-ride scholarship taking one of those tests...

No, she got a full-ride scholarship by taking a test that was about 25 times more difficult than the SAT and GRE were.

IQ tests (GRE, SAT, LSAT, etc all being examples thereof) are actually great predictors... of how well people do at school, and not much else. That's what they were designed to do, and that is about all they do. You can game them somehwat (Kaplan's entire existence is predicated on this), but it's hard and expensive and only makes so much of a difference.

Most of the performance metrics discussed here make sense in and of themselves, the problem comes when you TELL people what the metric is, AND you reward them based on it. That's where you get the problem of gaming.

Another education example: When I was working as a Teaching Assistant in grad school we mostly had to grade. We would also get evaluated by the students, and if those evaluations were good we could use them for solid resume foder. I used to point out egregious spelling and grammar and logic mistakes when I graded, even if we weren't taking points off for them, just because I figured it was something students should be exposed to and learn. This annoyed a bunch of them enough that I started to hear the grumbling through back channels. Eventually my advisort told me that it wasn't worth my doing this, because I needed good ratings, and that this form of education was a struggle for only the tenured profs.

Most nutty performance metrics are intended to fix or try to fix a system that's not working in the first place. Systems that work don't tend to need them. But in my (limited) experience there are two reasons why a system isn't working: one, what it's organized to do doesn't make sense (even if it did in the past); or two, and much more often, the morale and quality of personnel involved sucks.

You can't fix either of those things by measuring more carefully, and in fact you can only make the more common case worse that way. But most people don't really know how to make people happier & more productive, or rather, even if they do know, they don't know how to implement those things (and I include myself in this category too). Things like paying a decent amount, making the office nicer, including everyone in decision-making, letting people figure out how to improve their workflow & working environment. Too often efforts to that end wind up being perceived as PR exercises or just more corporate bullshit (e.g. mission statements).

Metrics can be useful when people choose & set them for themselves, and when they don't feel pressured to manipulate them, because they can help people understand how to improve their own performance, but as part of a command-and-control structure they seem very often worse than useless. Another way they can be useful is - with a motivated workforce - to help stop things from slipping through the cracks. In a typical productive workspace without a lot of controls, you need something to make sure you don't just forget about some things altogether.

On the other hand a lot of modern corporations seem to be set up specifically to maximise output while treating their employees badly - they take it as a matter of fact that this will be the best way to larger profits. And it's really hard or perhaps impossible to turn that around. Compare, for example, McDonalds and Starbucks (Starbucks-haters pipe down for a minute!) - as I understand it, the corporate culture at Starbucks encourages local problem-solving, pays their employees pretty well, and tries quite hard not to be a soulless pit of corporate hell. McDonalds, on the other hand, seems well-adjusted to simply being a "turnover employer", you might say; a place where people need no skills whatsoever to work there and are expected to leave as soon as they possibly can. Now, I don't know about you but I prefer the customer service experience at a typical Starbucks to that at a typical McDonalds, and I'm quite certain their profit-margins are enormously higher too. But McDonalds is stuck at the bottom of a steep spiral of cost-cutting and there really isn't a good way to climb out of it for them - nobody's going to start paying more for crappy burgers served by surly staff in dirty restaurants, and if they're not, why bother trying to fix any one of those problems?

The comments to this entry are closed.

Google search

  • Custom Search

L.B. Archives

Google Adsense

Résumé


Help NOLA

Red Dress

More ads, sorry

Without exceptions

At least

If I had a hammer

If you must drive

An innocent man in over his head

AddThis Social Bookmark Button

Thanks

  • The 2007 Weblog Awards

sitemeter


Tip Jar

Change is good

Tip Jar