7 min read

A list of AI moderation ideas, ranked by me

Meta's Oversight board's new whitepaper contains a host of best practices related to automated content moderation. They're great recommendations but will be significant lifts for many companies. Here's my take on their feasibility and impact

I'm Alice Hunsberger. Trust & Safety Insider is my weekly rundown on the topics, industry trends and workplace strategies that trust and safety professionals need to know about to do their job.

I’m writing this from the Philippines, where my mother spent her early childhood in Metro Manila. The cityscapes are so beautiful and the colours so vibrant (check the end of today's newsletter for proof). I’m excited to see places I’ve heard about from my family, and to meet many of my PartnerHero colleagues in person for the first time. If you’ve ever lived or visited here (or have advice on what to pack for long haul flights), let me know!

This week, I decided to rank the recommendations in the Oversight Board's new report on balancing automated content moderation with human rights. You know, just for fun, as one does.

Get in touch if you'd like your questions answered or just want to share your feedback. Here we go!

— Alice

PS: T&S can be depressing, so here’s two pieces of good news: AI chatbots are pretty great at convincing people that conspiracy theories aren't true, and a new global reporting platform was just created to address online violence.


Today's edition of T&S Insider is in partnership with Checkstep, the all-in-one Trust & Safety Platform

This week, many T&S leaders will attend the Global Dating Insights conference in London. If you’re looking for the next event to attend, we've compiled an easy-to-navigate visual agenda of all the key events happening this fall and winter.

Whether you're eager to explore the latest in AI moderation or simply want to connect with industry peers, this agenda will help you plan your next few months with ease.

If you're organising a Trust & Safety event and don't see it listed, drop me a line at lara@checkstep.com. I’d love to include it!


A big wishlist from the Oversight Board

Why this matters: Meta's Oversight board's new whitepaper contains a host of best practices related to automated content moderation. They're great recommendations but will be significant lifts for many companies. Here's my take on their feasibility and impact

Meta’s Oversight Board just released a new whitepaper, Content Moderation In A New Era For AI And Automation. The paper isn’t about AI governance and safety, but rather how to balance the use of AI and automation for content moderation decisions with human rights principles. Most platforms apply policies using some form of machine learning and, as the paper notes:

AI algorithms can reinforce existing societal biases or lean to one side of ideological divides. It is imperative for platforms to ensure that freedom of expression and human rights considerations are embedded in these tools early and by design, bearing in mind the immense institutional and technological challenges of overhauling systems already operating at a massive scale.

There are also some recommendations, which appear sensible at first glance but — like many suggestions of this nature — don't have clear prioritisation. So I thought it would be useful and fun to take a look at each one and rank them in order of what I think is the highest impact and most feasible for implementation.

Disclaimer: This is my personal perspective and amounts to an educated guess. Every platform is different and will have different challenges. Let me know if you’d rank things differently.

Consult human rights experts

AI-powered content moderation tools should be developed with input from global experts in human rights, freedom of expression and ethics. Their recommendations for safety should be built into the design.

“Listen to human rights experts” say the human rights experts! In reality, most T&S teams are thinking about human rights frameworks already. At many platforms, it’s not lack of expertise that is the bottleneck, it’s company priorities. If committing to consulting human rights experts means that execs actually do what those experts say, then it’s worthwhile, but many recommendations are going to conflict with core metrics (like revenue, engagement, etc).

I like that the Oversight Board specifically wrote “recommendations should be built into the design” to emphasise the fact that it’s not the consultation that matters but the outcome. It’s almost like they anticipate platforms consulting experts but not actually doing what they recommend.

My verdict: Low effort, potentially high impact, but very unlikely.

Tell users why moderation decisions were made, and how

Platforms should use automation to help users understand why their content was removed and provide clear notifications. Users deserve to know if a human or AI made the decision and should be able to appeal with added context.

I think this is reasonable, given that similar requirements are in the Digital Services Act and are therefore in place for users in the EU (so the work doesn’t have to be started from scratch). These notices should be available to all users, regardless of where in the world they live. Platforms can’t expect users to follow rules without transparency, and implementing this globally shouldn’t be too difficult if platforms aren’t doing it already.

My verdict: High impact, low-ish effort.

GenAI models should be used worldwide

The benefits of new generative AI models should be shared fairly across all social media users worldwide.

Companies must not evaluate model performance based solely on the results of English-language benchmarks, or of aggregated tests in which English is disproportionately represented, but rather with the breadth of their global audiences in mind.

GenAI unlocks the possibility to scale moderation models across the world (not just focused on English-speaking countries + the EU). With genAI, creating global parity for moderation models is easier than ever, but platforms would need to invest in a lot of global employees with local language fluency in order to monitor these models effectively (see below). However, creating more jobs in T&S sounds great!

My verdict: High impact, medium effort.

Audit automated systems

Automated moderation systems need ongoing, thorough evaluation to ensure they work effectively for the most vulnerable and at-risk users.

I encourage you to read the paper and look at the examples of over- and under-enforcement with automated systems. They write in a case about anti-trans hate speech:

The fundamental issue in this case was not with Meta’s policies, but its enforcement. Automated systems that both enforce content policies and prioritize content for review need training to be able to recognize the kind of coded language and context-based images considered in this case. It is critically important that platforms audit the accuracy of these systems, particularly in regard to coded references.

Everyone should already be auditing automated decisions, but the question is how thorough is the Oversight Board hoping people will be? Their example in the whitepaper was a false positive case with 215 individual appeals, which would be peanuts to Meta. If that’s the standard, then — as I note above — it sure would create a lot of new jobs in T&S but it wouldn’t technically be that difficult.

My verdict: Medium impact, medium effort.

Give researchers access to data

Transparency is paramount – third-party researchers worldwide should have access to data in order to assess these tools.

If there’s actually full transparency, this could be really high impact. This is why companies are rightfully nervous— it’s in their best interest to keep data under lock and key. For some companies, opening up data to outside researchers could be relatively easy, and for others who have convoluted data systems, it could be almost impossible. Based on casual chats with T&S folks, I believe this is fairly common.

I’ve written about this before, but one of the frustrating things about T&S research (from my side of the fence as a practitioner) is that it’s often very far removed from the reality on the ground. This isn’t their fault necessarily — they’re just not seeing the data that we do. Public pressure and outrage is also a powerful motivator for change. I’d love to see this happen, but it will have to come from regulation.

My verdict: Effort variable. Impact variable. Companies are scared by this and won’t do it unless they’re forced to.

Label altered content

Platforms should also label content that’s been significantly altered and could mislead users.

Platforms have to look at whether media is manipulated as well as the potential harm that manipulated media may cause, and neither is easy. There’s no super-reliable tech for detecting “altered content” and half-assed labelling could lull people into thinking that anything not labeled is safe. That said, there’s been a lot of thinking about labelling in the last couple of years; labels are less intrusive than removals, and community labelling can make this more feasible. It’s an extremely challenging task for all but the largest platforms.

My verdict: High effort, medium-high impact.

Revise NCII policy

Platforms should focus their policies on identifying lack of consent among those targeted by the proliferation of non-consensual deepfake intimate images.

Context indicating the nude or sexualized aspects of a post are AI-generated or otherwise manipulated should be considered as a signal of non-consent. Setting a standard that AI generation or manipulation of intimate images are inherently indicators of non-consent would be major step forward given the rapid increase of deepfakes.

Lately, we’ve seen deepfake NCII (non-consensual intimate imagery) guidance from the White House and Ofcom, so I’m not surprised to see this addressed here. I worry about this policy change affecting self expression, especially because nudity and sexual content is so difficult to moderate in the first place. For me, this comes down to how much importance is placed on the “signal of non-consent” in decision-making.

My verdict: I’m not sure. I need to think about it more.

Further listening: Friend of EiM Katie Harbath does a great job of unpacking the report with Oversight Board member Paolo Carozza in her most recent Impossible Tradeoffs podcast. Go and have a listen.

You ask, I answer

Send me your questions — or things you need help to think through — and I'll answer them in an upcoming edition of T&S Insider, only with Everything in Moderation*

Get in touch

Also worth reading

Twitter’s Pre-Musk Plans Mirrored Elon’s Vision—Until He Abandoned, Trashed Or Ignored Them (TechDirt)
Why? Elon being Elon, as told by two New York Times reporters.

A Look Behind the Screens: Examining the Data Practices of Social Media and Video Streaming Services (FTC)
Why? "The status quo is unacceptable" says the FTC. But it's not as simple as that. Keen to know what T&S Insider readers think of this one.

Governing AI for Humanity (The United Nations AI Advisory Body)
Why? A new bumper report from the UN's multi-stakeholder, high-level, multi- hypenated Advisory Body on Artificial Intelligence.


PS... The Philippines is so pretty!

A photo of Manila in the Philippines from above
Look at that view!