Pros and cons of different moderation tools


Let’s talk about what’s good and bad about different ways to moderate.


Shadow bans. These are used by Salon, YouTube (and who else?)

They’re a way of tricking people into thinking their comments are being posted, when in fact only the poster (when logged in) can see it.

To me, that seems a level of deception that isn’t ideal for a journalistic outlet that is asking people to trust them, and also to uphold free speech.


Reddit used shadowbans for years, which I often saw cited by reddit’s users as an underlying cause of the atmosphere of distrust that made Ellen Pao’s tenure as CEO so difficult. When Steve Huffman came in one of the first things promised was a replacement for the shadowban, which was accomplished a couple months ago.


I think the idea behind shadow bans is to discourage commenters from writing bad comments. In the shadow they feel ignored and maybe adapt their behavior in the future. But I think you are right pointing out that this method is not adequate for a journalistic medium.

A similar method that is less deceptive was presented at a conference in Germany. The name of the method is “Trolldrossel” – troll throttle – and played as a joke on the most horrible community. And it goes like this:

For a comment to be published the user has to solve a captcha. After sending your answer, an algorithm calculates the probabilty of your comment being a bad comment. That probabilty is then applied to your chance of solving the capture. So for posting a really bad comment, the commenter has to solve ten different captchas correctly :ghost:

video of the presentation (only in german):


Ha, I like that. Raising the cost of trolling (which reminds me of this tweet.) I’m sure Civil Comments could, if they don’t already, force trolls to do moderation duties before being allowed back in (with double-entry style confirmation that they are following the rules properly.)

How else could trolls be forced to do more work than it’s worth to post something unpleasant?


I always thought the shadowban was meant to be targeted at non persons / bots in order to avoid the whack a mole scenario.


I haven’t read anything really interesting yet about it (open to suggestions), but applying machine learning to comments and online communities is a form of moderation that’s probably coming soon. A good moderation framework takes into account thousands of signals and can creatively combine to make new ones, something that so far we’re still dependent on humans for. I’m more in the augmenting vs. replacing camp on that though.


We’re definitely interested in the idea of applying machine learning to comments. What areas do you think would find it most useful? And is there a danger that, if it becomes too obvious, bad actors would see trying to evade it as a game, instead of an incentive to change their behavior?


I think it’s especially useful in the RIYL / personalization aspect. One of the frustrations w/comment sections is that they often feel removed from the primary content or other related content. Mandy Brown just wrote something in her newsletter about that around hyperlinking - - I suspect a better framework for links / discovery in comment sections would be a really interesting application of machine learning.

Re: bad actors, absolutely. I’m no dev expert on how to restructure comments, but just like algorithms that surface content in news feeds on social platforms, I think comment sections need to have more inputs and machine learning could help with that (e.g. signals like overall time on platform, # of comments, upvotes / downvotes, abusive language, redcards from other users, etc). Smyte is doing interesting things there -


I agree that machine learning will have a big role to play in online communities - I hope that we see it more in the augmenting role. It will be very hard to design a machine learning solution which works well across all communities - there are just too many idiosyncrasies, but having human oversight can help with that.

I really like the idea of “soft” barriers like that troll captcha method, especially when they are combined with automated methods. It’s more forgiving of possible error for the automated methods, and lets people make the decision whether or not to follow through with a post rather than having a bot make it for them.


We just discussed bots/spam a few moments ago actually. I’m not sure it’s necessarily useful in real world interactions as eventually people will figure that out, whether by logging out and seeing their comments are not there (yet everyone elses are), or simply by making another account.


Ultimately, the effectiveness of any moderation tool comes down to effort:

  • The effort required to enforce community standards.
  • The effort required to disrupt the community.
  • The effort required to evade bans.

Keep the effort for moderators low and disruptors high, and you will have the capacity for effective moderation. Fail to do so, and you are playing a losing game of whack-a-mole - it will be impossible to scale a site’s moderation enough to keep up with growth and persistent trolls.

Another problem of scale is that, as moderators become embittered by constant exposure the worst forms of human communication imaginable. they change too., and their ability to be fair and impartial is compromised. Some of them become toxic, lashing out at users, some of them become desensitized, glossing over clear violations, some of them start to play favorites, perhaps allowing bullying by an established commenter while showing no leniency whatsoever to newcomers.

Where this becomes challenging is that the second two points - effort required to disrupt, and effort required to evade, are also at odds with producing a lively community - if you make it too hard to post, people will become frustrated and give up, if you make it too hard to register, people just won’t. Furthermore, the most toxic users always seem more determined to jump through whatever hoops you’ve set out for them than the audience you really want to attract.

Given these properties, many communities find it necessary to add in a reputation system. A good reputation system recognizes investment in, and engagement with the community.

Reputation cannot replace moderation, but it can supplement it in many ways, which allow moderation tools and staff to scale better.

  • Users with low, or even poor reputation can be moderated by the community instead of, or ahead of, official moderators.
  • Users with high reputation can be made resistant to the effects of community moderation backlash, but not to actual appointed moderation. They’ve proven themselves, so having a controversial opinion alone does not mean they are being disruptive.
  • Reputation can be hard to get even when the goal is to make registration as easy as possible, meaning that when a troll is banned, they need to invest time rebuilding it on a new account.

However, if you aren’t very careful, reputation can turn sites into an echo chamber. Reddit is a great example of this, commenters who violate the established mob mentality on a particular subreddit get vigorously down-voted until their comments are hidden. There needs to be room for well reasoned dissent in many serious forums, particularly those forums that cover controversial subjects - if your reputation system heavily favors groupthink, this won’t happen.

Some sites go further in their reputation system to try to curb the negatives - Stackexchange has a cost to downvote - you sacrifice some of your own reputation every time you click that down arrow. Discourse has a “Flag” button rather than a downvote button, Slashdot requires moderator actions to be accompanied by .keywords like “Informative” or “Funny” or “Troll” or “Flamebait” indicating why a post was moderated in the way it was, and then those actions are anonymized and the moderator actions are “metamoderated” to detect moderator biases and remove “bad” community moderators from the pool.


That’s quite a large scale but you actually would get better results with more data. Soon enough you [will] have [a] neural net that additionally trolls alt-right sites. Man can it spot coded speech and dog whistling. :wink: All this interfaces with your moderation tools which are tweaked for your local idiosyncrasies.


I will field that one. OK, using the existing user interface. Your comment is now highlighted in various spots using different colors. Hovering will get you text explanation. Red for racist. Pink for going in the profane tab. You get it.

Now the user can self moderate before posting depending on their preferences. Perhaps they will have to await moderation or be bumped to bottom for example. Very engaging.

Now with bad actors, known coded speech isn’t the problem. But if I am trolling with my neo buddies on your site we can even pick random words like banana to call you a bad name. :wink: We buddies all know what I mean. If my buddy guesses wrong, it matters little. However even this starts getting tagged because it is an anomaly in the thread or on the site. If you beat our AI, here, have a cookie, thanks for the data, we both know who is smarter. Ultimately my bad “message” is so obfuscated it [is] harmless and will start showing up in the noise comment poster list.


Thanks DHorse! Since I wrote that, we’ve applied machine learning technology in the following way:

I’d be wary of using an algorithm to declare that someone was being racist, but Google Jigsaw’s solution seems a lot smarter than that, and is improving all the time. They also offer newsrooms bespoke machine learning data sets based on their moderation history, which could make a huge difference for individual standards.


I agree. “Appears to be” might be the words to use. If you allow the system to be gamed, mods and the AI have a phrase specific revision history including moderator assessment. As a user, I could tag my bad phrase as satire.

Same effort, better results. It would require an additional API class and functionality on the AI side. You could even hire a few “thugs” to game the system if you are Google.

[edit: Do I respond here after reading the articles cited?]

[another edit: We hire hackers at security firms. Lol. Give a poor racist a job.]


Good name. Helpful write-up. Perspective API looks good. Nice work.