Fair moderation is hard but fair, scalable moderation is harder
I'm Alice Hunsberger. Trust & Safety Insider is my weekly rundown on the topics, industry trends and workplace strategies that trust and safety professionals need to know about to do their job.
This week, I'm thinking about content moderation at scale, and what role frontline moderators have to play.
Thanks to everyone who reached out about last week's edition– it seems like it really resonated with many of you. Get in touch if you've got ideas about what I should write about next.
Here we go! — Alice
The recent takedown of Kidflix — one of the largest paedophile platforms in the world — revealed a chilling truth: child exploitation online is no outlier - it’s engineered into the infrastructure of the internet.
With CSAM spreading through advanced tech and everyday platforms, safety can’t be an afterthought. Resolver Trust & Safety works at the frontline, helping platforms design systems that detect and disrupt abuse before it scales.
Because the next Kidflix is already being built - and the time to act is now.
What it would take to prioritise quality in content moderation
We often talk about moderation as a numbers game: how fast can we respond, how much can we automate, how many decisions can we review per minute. But for those of us who’ve worked across the system, the real question is more fundamental: how do we know if we’re getting it right?
A recent research paper — The Role of Expertise in Effectively Moderating Harmful Social Media Content— has looked at this question in the context of real-world harm and genocide. The authors investigated social media moderation during times of conflict and genocide, focusing on posts targeting Tigrayans during the 2020-2022 Tigray war. It came to the interesting conclusion that:
“open discussions and deliberation meetings where moderators can exchange contextual information and resolve disagreements to be more effective ways of ensuring that harmful content is appropriately flagged.”
They did this by giving moderators space to think more deeply, take context into account, exercise their judgment, and then spend as much time as they needed to make the decisions themselves:
"We decided on 55 posts per week to restrict the participants’ exposure to harmful content to a maximum of one hour per day, based on feedback from participants and the psychological impacts this content can cause. After each round of annotation, our expert annotators spent a minimum of 60 minutes discussing their disagreements and arrived at a final agreed-upon label for each post after deliberation."
What was notable to me was that, even after this lengthy deliberation step, there were still significant disagreement; as high as 71% and, after several rounds, a still very high 33%. This shows how difficult making moderation decisions is.
The paper also notes the source of these disagreements, including knowledge of local languages, cultural context, differing political stances, ambiguity of social and political terms, posts missing context, and out of date policies. It's a reminder that, when disagreement among moderators is treated as something to be eliminated rather than understood, we end up optimising for uniformity instead of accuracy.
Can quality scale?
Platforms often rely on BPOs for geographic and linguistic expertise — but frontline workers rarely have the authority to influence policy. That hierarchy can obscure key insights about how rules land in the real world.
And yet hiring local experts at the scale and geographic spread required to moderate a global user base is incredibly difficult. It requires logistical and operational infrastructure that individual platforms can’t replicate easily, and it makes sense to partner with an organisation (usually a BPO) that specialises in that. This results in a strong hierarchy around policy development that sometimes loses the insights of the workers on the frontlines.
The authors of the research argue that giving moderators more voice in policy design and encouraging open deliberation could lead to more culturally accurate enforcement:
Social media platforms should establish processes that enable content moderators to co-design and periodically update moderation policies and improve the adaptation of these policies to the target languages and cultures. This includes establishing a process that involves not only market specialists in the co-designing and updating of policies but also moderators who do the day-to-day moderation.
More accurate policies often mean more complexity, making consistent enforcement at scale difficult. The model in the paper — multiple moderators reviewing, deliberating, and aligning — doesn’t match real-world moderation, where decisions are often made individually and under time pressure. Yes, thoughtful judgment matters, but systems must also be efficient and scalable—something the paper glosses over.
AI could help by creating feedback loops between moderators, policy teams, and automation (disclosure: I work at an AI moderation company). But this will require rethinking operations and what we’re willing to invest in.
Taking a step back
To avoid losing the forest for the trees, I often run what I call an ‘outcome audit’. I sample moderation cases — often appeals or difficult edge cases — and evaluate the outcomes for both users and the platform, rather than whether the policy was followed to the letter.
For example, it could be that a policy was enforced correctly, but that there should have been room for more leniency or understanding for the user, given specific power dynamics at play. Or it could be that a policy was enforced incorrectly, but that there really was no clear right answer.
This matters even more when calibrating across vendors—whether tech providers or humans via BPOs. Quality assurance is only as strong as the metrics and reviewers behind it. If moderators are rated solely on protocol adherence, we risk building a system that is "highly accurate" but fundamentally misses the point.
Looking at both the ‘spirit’ and the ‘letter’ of the law helps surface these issues. But adding nuance comes at a cost: more time, more room for error. That’s why I’m cautious of academic studies based on small moderation samples — even when I agree with their call for more careful, context-aware systems.
You ask, I answer
Send me your questions — or things you need help to think through — and I'll answer them in an upcoming edition of T&S Insider, only with Everything in Moderation*
Get in touchAlso worth reading
Digital Parenting: Balancing Screens & Tech for Kids’ Digital Wellness (Nurture)
Why? A really balanced, actionable look at what modern digital parenting might look like, and how to think about screen time for kids.
The “De” In “Decentralization” Stands For “Democracy” (Techdirt)
Why? I've really appreciated TechDirt's democracy coverage, and I love how Mike articulates why democracy and technology journalism should be seen as one and the same in this era.
This ‘College Protester’ Isn’t Real. It’s an AI-Powered Undercover Bot for Cops (404 Media)
Why? Missing from this article: what platforms think about these AI bots and whether they're violating community guidelines, or whether law enforcement is running these bots in collaboration with platforms to entrap their users.
Member discussion