How do you measure comment quality?


Jason Alcorn has written a piece about measuring impact on articles for Mediashift in which he includes

Number and Quality of On-Site Comments

Which led me to wonder, how do you measure quality? I asked Jason on Twitter for some ideas:

What do you think?

Introduce yourself!

The surest way to have accurate quality scores attributed to comments is to have multiple editors review and rate them across standard categories – e.g. Insightful; Informative; Interesting – so you can have apples:apples.

NLP, term densities, sentiment analysis can help guide editorial reviews. They can’t replace it.

In addition to people reading and rating (with hope it’s in addition), some quant metrics can be applied to help in determining quality:

  • The known history of who made the comment
  • Views, upvotes, replies, shares on the comment
  • The known history of those who’ve viewed, upvoted, replied, shared that comment

Known history of the commenter and those who have engaged with the comment can include what they’ve done on the current site (like Kinja), the reputation they’ve built on related sites, sites using common tech (like Disqus),on Twitter, elsewhere.

This discussion about recognizing top commenters is relevant.

Comments here about moderation and meta-moderation are directly tied to arriving at comment quality.


We did an experiment looking at how the quality of a comment thread had changed with our involvement and it was illuminating, but super time consuming - as @smcnally noted, we had to go manually comment by comment and rate on a scale of 1-5. We then compared comments on a story where we got involved, to an older story in the same section, on the same topic and (if possible) by the same author where we didn’t. We did it for about 20 pairs, if I remember correctly…it wasn’t perfect but the results were quite interesting.

Our rating score was something like the below. It all depends on how you define a quality comment internally.
1 (argumentative, detracts from discussion)
2 (vague, snarky, not thoughtful but not actively offensive)
3 (not bad, not good, doesn’t add to discussion, could add some humor or positive feedback)
4 (added to conversation, relevant, thoughtful, intelligent)
5 (exemplary, specific, brings in other readers, asks questions or shares personal insight/experience)

Steve - it’s probably much more accurate to do 1-5 on a variety of standard categories vs just one score per comment - that’s a good idea.

Our quantitative metrics were measuring engagement more than quality - like diversity (number of commenters/number of comments), discussion (what percentage of comments were in threads), etc.

The other, much more replicable but less illuminating metric we also look at is civility…using percentage of comments deleted by moderators. I noticed the Guardian’s big analysis was based mostly around that metric.