By on June 27, 2006

consumer reportys.jpgFor decades, Consumer Reports has been the American automobile buyer’s primary source for vehicle reliability information.  Tens of millions of highly-educated, independent-minded people have made their car purchase based on a brace of red dots.  While I don’t care for the dots– they’re a blunt instrument that can hide as much information as they convey– I’ve always assumed that Consumer Reports’ (CR) underlying data was solid.  And then I took their survey…

Of the survey’s 19 questions, only one collects the data that's ultimately responsible for Consumer Reports' final, all-important reliability dots: question number 13. 

“If you had any problems with your car in the last year (April 1, 2005 through March 31, 2006) that you considered SERIOUS because of cost, failure, safety or downtime, click the appropriate box(es) for each car.  INCLUDE problems covered by warranty.  DO NOT INCLUDE 1) problems resulting from accident damage; or 2) replacement of normal maintenance items (brake pads, batteries, mufflers) unless they were replaced much sooner or more often than expected.”

CR’s form then lists the car’s major systems, with a simple checkbox next to each.  That means that multiple problems with a single system, such as ongoing hassles with a car’s electrics, count once.  Equally troubling, respondents are supposed to remember a car problem that may have occurred over a year ago.  They also need to remember whether incidents near the cutoff happened in March or April.  Respondents that err on the safe side and report problems that might have happened within the timeframe, and do this year after year, are likely to report some problems twice. 

There's an even more profound methodological iceberg dead ahead.  CR’s dots signal “SERIOUS” problems [note the caps], yet never defines the term. I’ve always wondered how CR staffers decided whether a problem is “serious” enough to include in their survey.  They don’t.  CR’s question 13 requires that individual respondents make the call, based on “cost, failure, safety or downtime,” or other entirely subjective criteria.


This is a buck that should not be passed. Anyone with a significant other knows that two people hardly ever agree on what constitutes a “serious” problem.  As CR does not provides clear guidelines as to which problems qualify as SERIOUS and which do not, the resulting data is not reliable.  Would it be so hard for CR to provide a definition of that includes a dollar amount or the number of days out of service? Apparently so.

Without unambiguous guidelines, extraneous influences intrude.  First, there’s the respondent’s general opinion of the car.  Things gone right can ameliorate things gone wrong.  Why else would some people keeping buying those pricey “black dot” jobs?  Second, the reliability of cars past shapes consumers’ expectations.  If the participant’s previous car lost a transmission, then a bad alternator may not seem so SERIOUS.  Unless the current car is the same brand, and the participant is starting to feel twice fooled.  Then a burned-out turn signal may seem SERIOUS.  And third, if the dealer was smart enough to play nice, maybe kicking in a free loaner, then a SERIOUS problem will seem less severe.  

Finally, we come to the part of the question which cautions that replacement of “normal maintenance items” shouldn’t be reported “unless they were replaced much sooner or more often than expected.”  This instruction lumps maintenance and repair items together, with no way for CR’s analysts to separate the data later (should they be so ambitious).  And, once again, the respondent must define terms, deciding what items count as “normal” and assess the gap between their expectations and reality (usually called irony).  

If CR is going to include wear items, it should specify how long they should last.  But how long should brake pads last?  Expectations are going to vary.  A lot.   Brake pad life is heavily affected by driving style, driving conditions, a tire shop’s financial goals and other factors that have nothing to do with reliability.  And batteries?  How many times were the lights left on?  How much crud has been allowed to build up around the terminals?  Asking average car owners to gauge their vehicle’s parts wear against an entirely subjective ideal does not a scientific study make.  If they really want to know about brake pads and batteries, they should at least ask about them separately, to keep the nasty things from contaminating the entire data set. And provide some guidelines.

I’m no triskaidekaphobic.  But Consumer Reports’ question 13 does nothing to instill confidence in their reliability ratings, and much to cast doubt on their value. Respondents and readers need a more scientific and, ultimately, more useful guide to automotive reliability.  Until CR’s survey undergoes a major overhaul, readers will be misled and manufacturers won’t have the valuable feedback they need to make genuine improvements.

[Michael Karesh operates www.truedelta.com , a vehicle reliability and price comparison site.] 

Get the latest TTAC e-Newsletter!

Recommended

32 Comments on “Our Reporter Reports on Consumer Reports...”


  • avatar
    GS650G

    I have made better decisions by asking current owners, even total strangers, about cars than by following CR. A Korean car I own was “estimated” to cost 1200 dollars in upkeep for the first 3 years of ownership by them, despite a 5 year bumper to bumper 60K warranty. Honda came in at a measley few dollars over a 5 year period despite a short warranty.

    I don’t see how CR can honestly provide complete and accurate assessments on so many car models and sum it all up in a simple dot.

    And I don’t think the quality-reliability gap between Japan and the US or even some Korean cars is as wide as they make it seem.

  • avatar

    I’ve had tremendous luck over the years buying cars off CR’s “Used Cars To Avoid” list. I thank Consumers Union for artificially deflating the resale prices of several cars that didn’t prove to be that bad.

    It wasn’t until a few years ago that we knew what those dots even meant. Their worst possible rating indicates a problem which might occur on as few as about 15% of all cars, or as many as, well, who knows?

    In fact, who knows how big their sample sizes actually are?

  • avatar

    Actually, with the latest system for calculating the system-level dots they can represent a problem rate as low as 3 percent. I’ve written about this issue here:

    http://www.truedelta.com/pieces/newdots.php

    But before taking the survey I had assumed that the data behind the dots was still fairly solid. They’ve been conducting there surveys forever, they ought to know how to do it very well by now.

  • avatar
    1981.911.SC

    Interesting that somebody who wants to make money by selling car information is bashing CR…..Hmmmm??!! Might CR do a better job? Maybe. Are they better than most of the inbred and financially interconnected car sites? Definitely!!!!

  • avatar
    geozinger

    The owner of the website http://www.allpar.com (who is also a behavioral psychologist), had the balls to tell CR the emporer had no clothes a while ago. His site no longer has the page where he explained his criticism of CR’s methodology, but it wasn’t all that far off from what Mr. Karesh is saying here.

    I wish he hadn’t removed the info, I found it incredibly enlightening. In later iterations of the site, he noted he had been approached by CR concerning his public criticisms. (This is my conjecture, I wonder if they managed to convince him to remove the page.)

    While one could argue that CR bashing by other auto publications has it’s roots in greed (possibly), for a publication to advertise itself as THE AUTHORITY, it’s methodology should be beyond reproach, no?

  • avatar
    automaton

    In my experience, Comsumer Reports has been fairly accurate. More so, at least, than asking the average “non-enthusiast”. Before my own experiences, my father has had great experience buying from their recommended lists.

    Neither of us have ever put much stock in the cost estimates, though. We tend to do most repairs ourselves, so they are obviously not accurate in the least from our perspective.

    Perhaps you’ve had a bad experience?

  • avatar

    I only started my own research because it seemed that if I wanted good reliability info, I had better collect it myself.

    But, go head, prove I’m biased. What’s wrong with the reasoning in this editorial? The question’s right there. Do you think it’s a good one?

  • avatar

    Geozinger,

    The owner of allpar and I exchange emails from time to time. He’s one of the good ones.

    The page you talk about is still there, though CR did force him to modify it.

    http://www.allpar.com/cr.html

    It might have be down at one point, but it’s up now.

  • avatar
    kitzler

    to M. Karesh, I would like to add, that comparing new cars, according to J.D.Power, another CR-type that dispenses advice, the most reliable cars had around 90 defects whereas the least reliable, the Range Rover had 204 defects. Mathematically speaking if a car has 100 defects, that means that some cars in that category could have 0 defects and another similar car could foreseeably have 300 defects, all within a three sigma bell curve distribution.
    Bottom line: (1) When you buy a particular model, you have absolutely no idea whether its defect rate will be on the low end, avergar or in the high end, and (2) with only a 2.25 ratio between the different models as stated in the J.D.Power survey, this survey really does not matter at all.

    As for CR, it is an emotional response from users, who when averaged out over thousands of voices, can I feel give a pretty good insight on which cars people (at a cocktail party or picnic) will brag about and which cars they might complain about… as far as technical value of the CR report, I have to conclude it is zilch…..

  • avatar
    kerstensutton

    One must realize that subjective people are making the assessments on their cars. I would be alot less concerned about noise, paint, or the radio in a $13,000 Kia, then I would be about a $70,000 car. Maybe this is one reason Kia is inching its way up the charts, and Range Rover has 204 so called “defects.”

  • avatar

    That’s 204 per 100 cars, or 2 per car. And those include “design defects.”

    I’ve written about IQS elsewhere on TTAC.

  • avatar
    kitzler

    I stand corrected, M. Karesh, it is per 100 cars, so the difference between the best car and the worst car amounts to one defect, on the average. In this context, I cannot emphasize the word average enough…..

    I would like though, to make my point crystal clear, don’t buy a car just because it has a better defect rating (JDPower or CR), buy it if you like the way it handles and performs and also if you are tantalized by its styling.

    the story on CR is however more appropriate if you plan to drive the car for more than four years, then the public response to the CR questionaire is, while emotional for the most part and indefinite as to what constitutes a problem, it is still a gut feel whether folks are still tantalized by their vehicle as it ages.

  • avatar
    cheezeweggie

    I personally dont totally agree with CU’s rating system. With that said, I know as many happy Asian car owners as unhappy domestic car owners. I guess that more or less reflects CU’s findings.

  • avatar
    DaveClark

    I have more than a “gut” feel on reliability based on warranty repairs performed by franchise with my access to this information. CU may not pass muster with the principles of the scientific method, but in general, I can’t argue with the results published by the magazine. Keep in mind too that luxury buyers tend to be more discriminating and therefore probably more inclined to take their car in for service.

    The domestic brands are generally below average. Special recognition should be given to Land Rover: I personally wouldn’t be “roving” outside a 25 mile radius with this ride. The German brands have taken their lumps with electrical related gremlins, so they’re –at best– average for reliability. The Asian brands, with few exceptions are very good to excellent. And the difference between the domestic brands and Asian brands is more than one defect!

    Taking issue with CU may be gist for the mill based on statistical grounds, but remember (if memory serves) too what Disraeli said: There are lies and there are damned lies (i.e. statistics).

  • avatar

    The numbers will be whatever the numbers will be. I just want to see some actual numbers based on a sound method.

    Because only one electrical defect can be reported per year, and some German cars seem to be plagued with them, it is possible that CR actually makes European cars seem better than they actually are.

  • avatar
    geozinger

    Michael Karesh,

    Thanks for the heads up on the allpar site. I guess I should have looked before I posted yesterday.

  • avatar
    stanshih

    Mr. Karesh,
    Thank You. I think I was about 25 years old when it dawned on me that Consumer Reports was subjectivity masquerading as objectivity. Although they are impartial and in the sense that they aren’t being bribed, it doesn’t mean that they aren’t unbiased.

    I could go on about the general beef(s) they have against GM vehicles, but I’ll just leave you all with one blatant example of their subjectivity:
    The VW Passat ca. 2002: CR’s “recommended” family sedan. Absolutely atrocious reliability. Would have been right at home in GM’s 1980s fleet if it weren’t for its gentrified interior. Consumer Reports apparently looked past its flaws and recommended that people buy this money drain just for its looks.

  • avatar
    DaveClark

    I don’t think CU has a single aesthetic strand in its DNA, so I disagree that CU is picking style over substance. And I think CU is more honorable than any J.D. Powers report. Remember, it was the latter that hailed the Pontiac Aztek with praise. CU ain’t perfect, but at worst it’s objectively wrong on occasion.

  • avatar
    chaz_233

    Congrats Dr. Karesh on contributing to TTAC! Can’t wait to see TrueDelta’s reliability data.

  • avatar
    stanshih

    Mr. Clark,
    I’ll have to politely disagree and argue that CR does in fact have an aesthetic strand of DNA (or a few base pairs anyway).
    CR always ALWAYS mentions exterior and interior “fit and finish” in its reviews despite the fact that it has no direct correlation with the functionality or reliability of a vehicle. The “fit and finish” critique is almost purely an aesthetic, visceral response.
    One could argue that the “fit and finish” may be indicative of overall quality in the same way that a guy walking down the street with mismatched socks may not be “with it”. This is a tenuous relationship, however, as the man with mismatched socks may in fact be a genius. Similarly, a car with relatively wide body gaps and shoddy interior panels could have lots of power, handle really well and be quite durable.

  • avatar

    I never have argued that CR is biased. In article I focus on the fact that the way their survey is worded respondents lack clear guidelines, so any biases these respondents might possess can strongly influence their responses. CR could take steps to limit the impact of respondent biases, but for whatever reason does not.

    The extent to which CR’s tastes in cars generate biases in its respondents is open to debate. The nature of the survey, however, is not.

  • avatar
    210delray

    Ah, the CR survey, everyone’s favorite whipping boy! Well for one, that allpar.com dismissal of the survey reminds me of the “discussion” (make that speculation) a bunch of college guys might have around 1 am after several rounds of drinks at their local watering hole!

    CR distributes a survey, which includes question #13 that basically asks about every category that you see in its reliability tables. They count the number of checks in the boxes (respondents check the box if they have a “serious problem,” [more on this later] excluding routine maintenance and problems caused by crashes), and provide a circle corresponding to percentages of those surveyed who claimed a problem.

    So the circles are based upon frequency of incidents, not the owner’s opinion of reliability. If very few people experience a specific problem (say Toyota engine sludge), it will not translate into a black circle. The problem may be drastic and severe for those who experience it, but if it impacts a small percentage of owners, then the reliability ranking can still be very high.

    Whether you subscribe to CR or Guns & Ammo (or both), and regardless of your age, race, religion, politics, or income level, it doesn’t impact whether the water pump continues to operate or if the power door locks stop working.

    The subscriber base would matter if the survey were focused on the owners’ opinions. But the survey doesn’t consider owner satisfaction in compiling the reliability data. The questions are not open-ended, but are based on specific parts and features on the car.

    One excellent thing about CR is the overall number of responses — it now gets about 1 million per year — which should reduce the margin of error inherent in any sample, because it’s an unusually high number of data points. (Nobody else in the business can claim numbers close to that.) This means that varying degrees of interpreting what “serious” means, or whether brake pads should last “x” number of miles gets averaged out. Same with whether problems occurring near the cutoff date of April 1 are reported or not.

    In any case, the issue is the relative frequency of problems, comparing one car to another. Given the sheer number of respondents, do you expect Chevy owners as a whole to behave differently than, say, Toyota owners? (I guess a former boss of mine might answer affirmatively to this question, when he said Chevy owners are “low lifes.”)

    The other thing that I like is that the survey is mercifully short, which makes it more likely that people will be more inclined to answer it with some thought applied, rather than just filling things in to get it over with. If you try to define “serious,” provide a minimum number of days the car has to be out of service, or give a minimum cost of repair, this will only make the survey more complicated. Fewer people would be likely to respond, and worse yet, people aren’t necessarily going to remember the exact cost or number of days the car was out of service. (I may keep detailed records of my maintenance and repair costs, but how many others do this?)

    As for JD Power, its customers are the automakers. It sends surveys to registered owners from lists compiled from RL Polk, which gets it from state DMVs. For those states where this information is not available, JDP gets it from the automakers themselves. I don’t see a problem here, either. JDP’s main problem is that the survey is extremely long, and I have to wonder whether people go to the trouble of answering accurately as they get to Question 200…

  • avatar

    You assume that despite the wording of this question and the nature of people’s memories that those boxes get checked when they should get checked, whenever that is. I don’t think this is a safe assumption to make.

    Relying on respondents to determine what counts as “serious” or “more often” lets satisfaction in through the back door. People satisfied with the product as a whole are less likely to see a problem with it as serious, especially with no guidelines as to what should count as serious. You’re no longer simply measuring reliability.

    It’s an unnecessarily sloppy questionnaire throughout. Perhaps in the name of simplicity, but sloppy nonetheless. For example, combining repair and maintenance items in many questions is virtually guaranteed to yield bad data. One near the end asks for the total amount spent on maintenance and repairs. What’s the point of asking that? You end up without a precise number for either.

  • avatar
    DaveClark

    Mr. Stanshih:
    People CARE about fit and finish on a car, so it’s completely appropriate they comment on it. Whereas one can measure panel gaps and paint millage with objective standards, CU has no tool for measuring style and stays away from it.

    If one was interested in finding, or evaluating, a genius, why would anyone care what he was wearing? I might even be LOOKING for mismatched socks. Is there a connection between sharp dressed executives at GM and intelligence? Hmm, that might be worth a closer look…

  • avatar

    Background: Auto journalist and CR believer

    Experience: CR recommendations are a good general predictor of car quality

    Example: Out of three brand new Range Rovers driven this year, two had “Serious” problems–one failed to start, one just up and died.

    Note: A coworker went to the CR press day last month: They have 22 full-time engineers in the automotive team, and conducts 50-odd rigorous, double blind type tests on each car. Do not confuse reader ratings with their test data.

  • avatar

    This editorial isn’t about their test data, it’s about their reader ratings.

    An N of 3 isn’t a sufficient sample, and this example is thoroughly useless without details. What was the actual problem in each case? Leave the lights on, and any car will fail to start. Were these press vehicles? We all know how those are treated. I’m sorry, but “died” just isn’t specific enough for me. Not a very technical term.

    The way the human mind works, it tries to make sense of everything as simply as possible. So, if someone reads in CR that one car is reliable, and that another is not, it is likely to interpret the same fairly minor problem with both differently. This problem with a “reliable” car will tend to be seen as a fluke, probably from something the owner did, while the same problem with the “unreliable” car is just more confirmation of just how unreliable those cars are.

    For example, with the Land Rovers you just mentioned, I would only say a failure to start is serious if some manufacturing defect were to blame. My car failed to start a couple of times this past winter. But I never fixed anything, and have had no trouble since. So I’m guessing that my kids left a light on inside the car. Do this even once, and a battery generally is never quite the same again. Until I replace it, I’m probably at risk of this problem happening again. Is this Mazda’s fault?

    I might suffer from the same biases. The time or two my wife’s Chrysler has failed to start, I was sure that it was a harbinger of some big looming problem. I was MUCH more worried than I was when my Mazda failed to start. But I haven’t had to fix anything with that car, either. Except the wheels are corroded enough to leak air…

    This said, I know they have a fairly large, experienced staff. That’s why I expected a much better questionnaire. A reliability survey should not be based on opinions.

  • avatar

    I should add that the above problem is avoidable, that’s why it’s somewhat tragic. If you tell people “report any problem that meets xyz criteria,” their biases have much less leeway within which to operate. They could still lie or rationalize their way to bending the criteria this way or that, but this would be less likely than with the questionnaire as it is written.

  • avatar
    bunny

    Don’t jump to conclusion, especially when that is not your speciality.

    Sure that the husband and the wife have totally different views about the term “serious.” However, over a large spectrum, say the customer bases of Camry and Sonata, the distribution of all the different husbands and wives are statistically static. The true pattern shows itself when thousands or more samples come in.

    CR is not perfect, but it’s still second to none as of now.

  • avatar

    Once you divide that 1,000,000 sample by model and model year, and assume that a large proportion own a Camry, Accord, Sienna, or Odyssey, you end up with a minimum in the range of 200 per car. Which gets further divided by powertrain.

    It’s good to have a large sample. But introducing a huge amount of unnecessary extraneous variation, as well as some defiite biases, squanders much of the size of this sample. You end up needing a positively huge sample to compensate. And that they don’t have for many models.

    The results are probably sound for the handful of models they have a huge sample for. They might have 50,000 Camrys. But the rest?

  • avatar
    cheezeweggie

    How dare CU bash the domestic car industry. The (GM deathwatch) comments made on this website are more than adequate.

  • avatar
    EricGo

    M Karesh:

    I think your criticisms of the methods are valid if the goal is to predict repair frequences by system for any one model, but I at least do not use them in that fashion. I think their value is in comparing across models.

    It is analogous to the famous EPA fuel economy debate. While most people will not obtain the epa large print value, it is nonetheless true almost across the board that the relative ratios hold true. By that I mean that if my current car in my hands gets 80% of averaged city/highway EPA, I am very likely to get 80% of averaged city/highway EPA of my next car.

    Detroit spends a lot of money and effort in trying to discredit CR, arguing that import owners have a memory or value bias not present in domestic car owners. It is a logical question to ask, but I have *never* seen any data to back up the conjecture.

    The onus of proof is on you.

  • avatar

    My original criticism of CR was that you really cannot compare cars using its ratings. You’ll find it here:

    http://www.truedelta.com/pieces/shortcomings.php

    True, a “better than average” car is likely to be more reliable than an “average” car, but how much better?

    How and where does Detroit spend “a lot of money”? Can you provide a specific example? If they’ve been trying to discredit CR, they’ve done a poor, viritually invisible job of it.

Read all comments

Back to TopLeave a Reply

You must be logged in to post a comment.

Recent Comments

  • Lou_BC: @Carlson Fan – My ’68 has 2.75:1 rear end. It buries the speedo needle. It came stock with the...
  • theflyersfan: Inside the Chicago Loop and up Lakeshore Drive rivals any great city in the world. The beauty of the...
  • A Scientist: When I was a teenager in the mid 90’s you could have one of these rolling s-boxes for a case of...
  • Mike Beranek: You should expand your knowledge base, clearly it’s insufficient. The race isn’t in...
  • Mike Beranek: ^^THIS^^ Chicago is FOX’s whipping boy because it makes Illinois a progressive bastion in the...

New Car Research

Get a Free Dealer Quote

Who We Are

  • Adam Tonge
  • Bozi Tatarevic
  • Corey Lewis
  • Jo Borras
  • Mark Baruth
  • Ronnie Schreiber