Building a Better Game Review

Few things in the gaming world are as controvertial as game reviews themselves.  Fanboys and fangirls wait with bated breath to peek at the scores for their most anticipated games.  If these scores aren't as high as their expectations, some of them are apt to explode, whether at writers, publishers, comment boxes, or at developers themselves.  Unfortunately, what would seem to be a simple subjective scoring has now turned into an important industry, as the likelyhood of developers and publishers could be impacted by such things.  Thus we have our current situation, where games are often reviewed as exacting and objectively as possible.  However, does this even remotely mirror the experience of the end user and help the customer in deciding how to spend their money?  I argue that this is fundamentally flawed and that we can find a better way, both for the developers and the consumers.

The first and easiest target, of course, is rampant score inflation.  The average game seems to score somewhat over a 7 by most modern, popular review sources.  To start, we can ask, why would they do this?  It happens for a couple of faulty reasons: 1. To appease developers and 2. By objectively comparing current games with older ones.  Developers like inflated scores because they see higher scores as increased sales.  Thus if everything gets pushed near the top, they believe that this benefits them as the customers would naturally relate high with good, perhaps not realizing that scores are inflated.  Many of the review sources willingly follow along with this, as publishers and developers are their livelihood.  They need them happy for ad revenue, sneak peeks, open accessibility and early/free games.  Many feel this pressure even without direct hints from the publisher and willingly bow to it.  Then we also have the other problem, objective comparisons to previous games.  Say a big developer created a game and it ended up a hit.  A reviewer first gave it an 8.5.  Now two years later, the developers release a sequel which looks better and fixes some of the issues in the first.  What do you give it?  It's very possibly that the reviewer will give a score higher than the original 8.5 because it's a better game than the original.  Of course, this isn't necessarily the proper way of doing things as you're mentally going back in time and comparing it in the previous time period instead of staying rooted in the present.  The concept is a bit hard to confront and counter, but it needs to be addressed for proper judgement.

But what's so bad about score inflation (and closed inflation in general) anyway?  Essentially, its entire purpose is to mislead and manipulate, generally to make a specific group appear better than they are at the expense of the entire system's legitimacy.  Unfortunately, similar kinds of closed inflation exist everywhere in our modern world (Bellcurveclosed inflation has values on a range versus open-ended inflation, which doesn't carry the same issues).  Take the scale 0-10, with a defined low, a defined high, and a defined mid-point (five).  If five is a true average for this scale and you throw 100 random dots (or reviews) on the line, you generally end up with a nice bell curve density to show for it.  This allows you to easily determine, both mentally and objectively, which scores are horrible, bad, average, good, and great.  In the end, isn't that all people want to know from a review score?  With a skewed curve, the results become misleading, if not downright manipulative.  Suddenly it becomes much more difficult to determine the difference between horrible, average and great as their separation is significantly reduced.  This problem also becomes more complicated as inflation isn't even standard amongst inflation sources, which then have to coexist with non-inflated scores.

Then how exactly do we counter score inflation?  Well, the answer lies with the public, the review sources and importantly, with aggregrate review sites such as gamerankings and metacritic.  The public can always make a difference if they're willing to take a stand and care about an issue.  They readers are the ones ultimately in control of letting these sources sink or swim, so their influence is even greater than inflation pressures if they can collectively make their voices and desires heard.  The review sources themselves are also more than capable of making a point to use a realistic scale instead of an inflated one.  Pushing an overall average from say, 7.5 to 6.5 is a huge step on its own and shouldn't be overly difficult to work into their systems.  Lastly, we can intelligently use aggregate sites to lessen the effect of inflation in a couple of ways.  The first method is to just combine scores from as many sites as possible and then using the results to display relative rankings of the games, which would then be emphasized over each individual score.  The second method is to use average review data from each source and then re-calculating a 'deinflated' average curve, with the middle around 5 (or 5.5 on a 1-10 scale).  By reweighting each site and then adding them to the mix, the ratings are immediately drastically improved and more meaningful, although this method still isn't as optimal as actual deinflation.

A second giant red target in the review world is the undue focus and importance on review scores versus the actual text.  Review scores mean something, but in the end each score is either a rigidly objective number (which doesn't necessarily help the end reader) or a more entirely subjective number (which also doesn't necessarily help the end reader).  Reading a full, well-written review certainly helps the reader decide if a product matches their tastes more than numerical methods ever could.  However, I'm not going to pretend for a moment that importance of scores are an easy thing to change, as the modern consumerist society by and large gravitates towards wanting their information in small, short, concise and easy-to-understand packages. Thus in the end, it all comes down to presentation and how the publishers decide to display scores in contrast to the review text itself.  Do they make a big deal of the score and paste it around several places; perhaps at the bottom of a page, at the top, and then again at the front or the back?  This can certainly be toned down.  I personally favor an approach like Edge-Online, which displays only a simple text number at the end of the page.  This re-emphasizes the text and encourages readers to take note of the entire rather than just skimming to the shiny score wrap-up boxes at the bottom.

I'd also like to see a rehaul of the subcategories so commonly found at the bottom of each review near the score.  You know the ones: graphics, sound, fun factor, gameplay, etc.  Again, I'm not saying that summaries are inherently bad things, but I feel that we could think of better common categories that more relate to what really impacts people's opinions of a game.  Thus I will propose four summary categories: Style/Atmosphere, Technical Aspects, Gameplay and Longevity.  Style/atmosphere is a natural combination of graphics, sound, force feedback and more.  It goes far beyond raw polygon counts.  Style and atmosphere are the sum of the immersive effects that make or break a player's final impression on the world the developers created for us.  Pumping more polygons is relatively easy.  Creating a good style and immersive atmosphere are not.  Technical is just that, a summary of the technical aspects of the game.  Is anything blantatly broken/glitchy?  Is the game created in a manner to allow the style and atmosphere and gameplay to shine through?  Modern games these days are very complex and while gamebreaking glitches are relatively rare, any amount of small glitches or technical issues may still frustrate gamers into putting the game away, possibly forever.  Technical aspects can also include audio/video features and pure prowess for those who look for such things (eg: native 1080p rendering with full 5.1 sound in-game and cutscene).  Gameplay is simply how entertaining the game is while interacting with the player.  For example, say you have an fps with an item system.  How fun are the shooting moments throughout the game?  How fun are the menu moments?  What about the exploration time?  Do they interact together well?  Or perhaps fall apart or don't mesh well at times?  Last, we have longevity, the measure of how long a game will be able to suitably entertain.  A well-balanced game with a strong leaderboard system and easy-to-use party/room systems has the capability to entertain for quite some time.  If it is a score attack type of game, then how balanced is the score system?  Games that are easy to learn and hard to master are optimal, as they're fun to pick up and play yet can be deep enough to allow players to continually learn and improve for a long time, up to their limits.  These four categories seem to nail what gamers look for far more accurately than most summary category, which are often downright vague or even bizarre. 

Overall, I suppose my main hope and desire is to take the rigid objectivity out of game reviewing and replace it with things more useful to the customer to aid in their decision making.  The way I see it, subjectivity (and even complete subjectivity) is quite fine with the proper use of aggregate review sites.  I'm not saying our current aggregate sites are anywhere close to perfect, but it's a nice start.  The sum of many subjective ideas will become a concrete objective result, and one that's far more in line with the public.

Comments

The First Hour's scoring system

So lots of great topics here both in the article and comments, just thought I would defend my use of the 1-10 range again. I did write this up a few months ago and it is still valid in my mind: http://firsthour.net/scores

Basically, I'm very numerically oriented, I like ranking things, ordering them, etc. and scores of 1-10 make that very easy. The scores aren't so unbelievable defined like IGN or Gamespot with 7.9's or whatever, but a reasonable abstraction of my total thoughts. My main problem is that I don't play enough bad games to average out the scale. I have fully intended from the start to use all 10 numbers of the scale, but since I'm not paid to do this, part of me still demands I play games that I know I would enjoy in the first place.

Anyways, I'm fine with reading any scoring system as long as it is reasonable, be it letter grades, stars, EPIC WIN, whatever.

Also, for the record, I've given out scores of 2, 3, and 4 :p

Oh, and don't complain that my scale doesn't have a real middle, the middle is 5, I just haven't found a game that deserves a zero yet!

As far as scales of scores, I

As far as scales of scores, I feel 1up really has the right idea, using letter grades instead of numbers. I don't necessarily know how well this works for international readers, but in the US it's as near unskewed as I've seen.

I don't like the idea of lumping graphics and sound together as style. There are a number of games with a large gap between the two, having technically proficient but uninspired graphics but a fantastic soundtrack, such as Halo. When one is stylistically void but the other is a masterpiece, how can you rate them with a single score?

I wouldn't summarize those

I wouldn't summarize those categories with a score, it's just more to talk about them since each can legitimately make-or-break a game for a buyer, based on their personal preferences.

Such a dichotomy could be a problem for someone focused on style, and thus would be stated as such.

I also forgot to include a

I also forgot to include a note on longevity. Longevity is not just say, the length of a single-player game or the time it takes to get to an end or finish all the optional dungeons. That means something, but longevity is more the time that the player can stay interested through continual self-improvement or interesting gameplay/story additions (or whatever else the devs can use to keep things fresh). I don't consider a game with a 100 hour 1000 floor near-identical dungeon grind to have very much in terms of longevity.

Great discussion

I agree with most of what you say and I'd be happy with all of the changes you propose, but I've got my own take on improvements and ideals...

Score Inflation: I think that's a tougher nut to crack than you seem to. I agree that having a 7 average is essentially making 6 and below the "no buy zone" for 99.9% of readers, hence making the majority of the scale irrelevant. But I don't think simply moving the average to 5 is the right solution. Even without tenths being thrown into the mix, that's still ten different ratings that can be assigned to a game...and it's hard to come up with even TEN scaled adjectives that you could use to describe the quality of an experience. I like the five-star system that the user-contributed site Backloggery uses (check out my list of games at http://www.backloggery.com/games.php?user=victorvonplugman).
1 = Bad, 2 = Decent, 3 = Good, 4 = Great, 5 = Outstanding
A five point system is easy to understand, offers an obvious average point, and even seems less damning of lower scores: a 2/5 just seems like less of a slap in the face than a 4/10 does to me, especially when the 2/5 means "decent." Because really, how many "Bad" scores do we need? Certainly not six of them like in the current 10 point system.

Text vs. Score: If I had my way, scores would be gone altogether. I try to give an easily-noticed and concrete text verdict at the end of my reviews (i.e. "if you're looking for _________, this is at least worth a rental") and, though I've only written a few full reviews here, I never mention any scores in the text and have only a Verdict at the end of it, where people naturally scroll down to when looking for a score. I still include the score in the info box, though, because if I feel like people would ask questions if I didn't.

Categories: I agree completely. We've come to the point where "graphics," a term that tended to have a technical tone in years past, is essentially irrelevant as almost all games at least look very good. I tend to go overboard in my reviews with Video, Audio, Story, Gameplay, Challenge, Uniqueness, Pacing, Longevity, Value, Fun Factor, Boxart, Instruction Manual Grammar, Disc Shininess, and a final Verdict, but I'd rather include a sentence for each very specific category than simplify things into four or five vague scores. That said, if I had to limit myself to four, I'd go with the ones you proposed.

Aggregate sites: What I'd like to see is a single review site where four or five reviewers of different gaming tastes each give their take on every game. Maybe A loves narrative/style, B loves mainstream action/racing, C is a competitive fighter/shooter, D plays casually, E likes cerebral puzzle/strategy, and each would obviously have their own perspective on the game. Obviously this would require five people to play each game (not necessarily the WHOLE game), but it would also let readers find which reviewer(s) they identify with and weight each opinion accordingly. And if all five reviewers gave the game a thumb's up, then it would obviously be a crowd-pleaser instead of a niche hit.

But yeah, great article and hopefully we get a good discussion out of this.

Yeah the actual score range

Yeah the actual score range is difficult and I haven't really thought about that too much. Hard to exactly say since the current inflated system almost requires those decimals near the top to have any separation at all... Right now my current thought is that 0-10 with decimals, 0-10 integer and 1-5 would all be fine. But yes, a 5-point system would be easier to understand but it would also reduce the precision of aggregate and comparative scoring. I think Greg's scoring system makes decent sense as far as adjectives go: http://firsthour.net/scores (although he doesn't have a proper integer midpoint :P)

Lots of good food for thought

My thoughts on the subject are kind of a mish mash.

1. I don't think we need scores at all.
2. If we Must have them, I'm all about stars. A 5 star system seems fine to me. If you only have a 4 or 5 star system, aggregate sites become less interesting, which is fine since I think they are a net negative for the industry.
3. The current scoring system REALLY only has 6 possible scores if you think about it.
1-5 is all the same to me. Then we have 6, 7, 8, 9 and 10. I generally completely ignore decimals when I read a review. So someone gives a game a 7.8? Really? It's absurd.

Also, keep in mind, the 100 point scale is more or less based on the letter grade system. So averages SHOULD be in the 70s, since that would be a C letter grade. 90s being an A and anything below 60 is simply a (F)ail.

1-5 feels the same to you

1-5 feels the same to you because the current scoring is broken where all low scores are pretty much all equally bad, so you'd still have that idea in your mind. If 5 is average, it becomes much easier to designate between below average, bad, and just terrible.

And I'm not sure, but it isn't that hard for me to think of decimal ratings. At least single decimals, not sure if could go to hundredths or thousandths lol. It certainly would make it more difficult to convert the score into words if you're thinking that way; but if you're just thinking relative numbers it can make sense.

Also it could be argued that the American school letter grade system is generally fairly broken and has many of its own problems (including inflation), although I didn't feel like getting into that (despite how it's a good example of the concept).

I'm not saying 5-star systems are bad, they work fine too and offer immediate concrete and easy-to-understand results. A little more accuracy never hurt things though.

I agree

"I'm not saying 5-star systems are bad, they work fine too and offer immediate concrete and easy-to-understand results. A little more accuracy never hurt things though."

I totally see what you're saying, but for me personally, I would look for the additional accuracy in the text of the review. Accuracy without context and comparison isn't of much use.

But yeah, again, great writeup and ensuing discussion. :)

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.