9 min read

The ethical and practical flaws in Meta's policy overhaul

Yes, Meta's recent policy overhaul says a lot about the company's priorities, leadership and strategic direction. But it's also a badly written and confusing policy. And the problem with bad policies is that they are very hard to enforce correctly.

I'm Alice Hunsberger. Trust & Safety Insider is my weekly rundown on the topics, industry trends and workplace strategies that trust and safety professionals need to know about to do their job.

This week, I'm all fired up about Meta's changes to their hate speech policies and content moderation enforcement (if you've been following me on social media I'm sure you've noticed).

If you're based in London and haven't already signed up for the EiM meetup at the end of January, what are you waiting for?! I'm sad I won't be able to make this one but perhaps next time

Get in touch and let me know what you've been thinking about this week. Here we go! — Alice


Today's edition of T&S Insider is in partnership with Safer by Thorn, a purpose-built solution for the detection of online sexual harms against children

Powered by trusted data and Thorn’s issue expertise, Safer helps Trust & Safety teams proactively detect CSAM and child sexual exploitation conversations.

Safeguard your platform and users with proprietary hashing and matching for verified CSAM and a classifier for finding possible novel image and video CSAM. Plus, Safer’s new text classifier provides much needed signals to help Trust & Safety teams find conversations that violate your child safety policies, such as those containing sextortion or requests for self-generated content from a minor.


A giant step backward in enforcement and user safety

Why this matters: Meta's recent policy overhaul says a lot about the company's priorities, leadership and strategic direction. But, more than that, the widened definition of 'hate speech' be difficult to consistently enforce and risks harming marginalised communities even more than currently thought.

There’s a lot I could say about last week’s Meta announcement (EiM #276), but my mind immediately went to the two things I know the best: policy and operations. Despite thousands of articles dissecting the now infamous five-minute video, I don’t feel these two aspects were covered enough. 

There are three aspects in particular that I’m most concerned about:

  1. The policy changes are about creating exceptions for the specific types of hate speech that the President-elect and a specific wing of the Republican party cares about, while still disallowing other, similar expressions. This tells us very clearly what Meta values (or rather doesn’t value).
  2. These changes are going to be almost impossible to enforce correctly, given the confusing and contradictory guidance. They feel hastily put together and incomplete.
  3. For many categories of policy violations, including hate speech, Meta is pivoting to a strategy that relies solely on user reports to send content to moderation review. I cannot say this strongly enough: this approach does not work.

I’ll go through each of these in more detail below. And I’d love to hear your thoughts — get in touch by hitting reply and I’ll look to round up EiM contributions next week

The clearest example of how Trust & Safety puts company values into action

As I’ve said here before, Trust & Safety is how companies put their values into action. It is impossible for any platform to be truly “neutral”— the tensions between user privacy, safety, and expression (and, frankly, social discourse around how to define them) mean that we must make tradeoffs every day that show what a company cares about and what they want to see more of on their platform.

Meta is a private platform; they have every right to be explicit about their values. In making these changes, Zuckerberg is signalling that Meta is now aligned with MAGA Republicans and what they care about (hard to say that hate is a “value”). 

However, I’m frustrated at the sleight-of-hand that Zuckerberg is trying to pull off: he used words like “free speech” and “simplifying” to make his moderation changes sound reasonable and positive (even neutral) when they’re anything but.

While there have clearly been issues with Meta’s moderation in the past (as is the case with all large platforms– moderation is hard), these changes are going to make Meta’s content moderation much less effective, consistent, fair, and accurate. 

Lauren Wagner, who worked on Meta’s Integrity Product team, wrote about her work on fact-checking and integrity at Meta (bold is her emphasis):

Meta developed the rules its content was judged against. During my tenure, these content rules could lack a clear philosophical foundation. Should they be grounded in democratic values, company priorities, free expression…?
Fact-checking at scale isn’t just a tech challenge—it’s a values challenge.
Without a coherent ideological framework, policies become reactive instead of strategic, leaving users and fact-checkers navigating arbitrary rules. Decisions were left to mid-level employees to figure out, rather than laddering up to a grand plan.

That’s pretty damning about the absence of values within Meta in the past. While those value may be more explicit now, it still doesn’t seem as if the company has a coherent ideological framework. When the political tides change, will Meta’s policies change again? 

During my career, I have only worked for companies that are (regardless of which political party is in power) explicitly progressive and supportive of marginalised communities, especially the LGBTQ+ community. Working in Trust & Safety for these companies has given me great satisfaction because I can enact these values by writing policies and designing enforcement mechanisms that are designed to protect marginalised communities and foster positive connections. But that is only possible when the values are aligned across the company, the employees, and the users. 

I don’t agree morally with the changes that Meta is making, but that’s only part of the reason they are terrible: based on my experience, they lack this cohesion and will be incredibly challenging to enforce.

Why these policy changes are morally and practically unenforceable

I have never worked at a company with Meta’s scale and level of complexity. There are nuances here that I cannot understand, and I’m absolutely paying attention to what former and current Meta employees are saying about this, like Lauren and Katie Harbath. However, as someone who has written hate speech policies and been responsible for T&S operations for platforms with millions and millions of people, I can definitively say that these changes will cause chaos for Meta’s integrity teams and for their users.

Why will they cause chaos? Because these policies are both badly written and confusing. And the problem with bad policies is that they are very hard to enforce correctly. If a policy is never properly enforced, it may as well not exist. Frankly, if someone on my team drafted these policies and sent them to me for approval, I would’ve sent them straight back.

Let me give a concrete example of why they are bad. 

Meta did not write a clear policy that allows free speech for everyone about everything. It is not, as Zuckerberg claims, “simplifying” their rules and broadening their definition of what is acceptable. If it had done that, Meta would’ve removed their hate speech clause entirely.

Instead, the policy documentation now carves out odd exceptions while still keeping the broad guidance against hate speech that it had before. Here’s its broad hateful conduct policy, tellingly changed from “hate speech” (bold emphasis mine):

We define hateful conduct as direct attacks against people — rather than concepts or institutions — on the basis of what we call protected characteristics (PCs): race, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity, and serious disease.

And here’s a new carve-out exception for allowing exclusion and insulting language based on gender identity (again, bold is my emphasis):

People sometimes use sex- or gender-exclusive language when discussing access to spaces often limited by sex or gender, such as access to bathrooms, specific schools, specific military, law enforcement, or teaching roles, and health or support groups. Other times, they call for exclusion or use insulting language in the context of discussing political or religious topics, such as when discussing transgender rights, immigration, or homosexuality. Finally, sometimes people curse at a gender in the context of a romantic break-up. Our policies are designed to allow room for these types of speech.

How are moderators supposed to navigate rules which allow the outright attack and dehumanisation of queer and trans people, women, and immigrants, while still being expected to remove similar speech based on religion? As The Intercept noted:

..sections of the materials provide examples of forbidden “insults about sexual immorality,” such as “Jewish women are slutty.” But the document also provides ample examples of newly permissible insults aimed at specific gender identities or sexual orientations, including “Gay people are sinners” and “Trans people are immoral.” A post stating “Lesbians are so stupid” would remain prohibited as a “mental insult,” though “Trans people are mentally ill” is marked as allowed.

The policies, as currently written, are a minefield of contradictory and specific rules and exceptions. This is going to be incredibly difficult for Meta’s vast team of moderators to enforce accurately and consistently. There are no decision trees or training manuals which can make this make sense.

And, as if to make their morals crystal clear, the new policy tells us exactly who these guidelines were written for, if you know enough about LGBTQ “discourse”:

We do allow allegations of mental illness or abnormality when based on gender or sexual orientation, given political and religious discourse about transgenderism and homosexuality

These are very clear dog-whistles aimed at the far-right. According to GLAAD:

transgenderism” [is] a right wing neologism intended to imply that being trans is an ideology, and “homosexuality” is an outdated and pathologizing way of referring to LGBTQ people. Meta has not previously used these terms in public policies or communications.

Confusingly, Meta says that they do remove slurs; however, they removed the line which says “We also prohibit the usage of slurs that are used to attack people on the basis of their protected characteristics.” The Intercept reports that new internal guidelines say “‘Tranny’ is no longer a designated slur and is now non-violating.” While Meta has not always removed content using this term, it was always clearly rules-violating under their broad hate speech policy. Now that policy has explicit exceptions. It’s clear that some slurs will still be removed, but not ones against the LGBTQ+ community.

Sidenote: It’s worth listening to my conversation last summer with Jenni Olsen from GLAAD for more about Meta’s history with moderating this term.

Because it knows these changes are going to lead to real-life repercussions, Meta removed the line in their policy that says [hate speech] “creates an environment of intimidation and exclusion, and in some cases may promote offline violence.”

Just because the company has taken that line out of its policy documentation doesn’t make it any less true. These changes are going to promote offline violence, especially because of how they are making changes to their policy-violating detection mechanisms.

But wait, there’s more…

Why relying on user reports isn’t enough

One of the aspects of Zuckerberg’s announcement was that automated systems have “resulted in too many mistakes and too much content being censored that shouldn’t have been”. (Nevermind that “censorship” is something a government does, not a private company, and the idea of “censorship” is much more complex than people think).

Now, for “less severe” violations (anything that isn’t terrorism, child sexual exploitation, drugs, and fraud/ scams) they will rely on users reporting an issue before they take any action.

As Dave Willner, who wrote some of Meta’s first policies, eloquently said, “user reports are crap, have always been crap, and will always be crap.”

This is immediately where my mind went too. I know this, because for years I had to manage content moderation operations for millions of users without any automated, proactive detection. I also spent years working with engineering and data science teams to put in automated detection measures to try to make things better. There are two big things I learnt:

  1. Users weaponise reporting when they’re emotional, angry or simply when they don’t like someone or something, even if there is no policy violation. Marginalised communities are especially targeted: on dating apps, for example, trans people are often reported as “fake” or “spam” just for being themselves. This causes a lot of noise/false positives that moderators have to wade through.
  2. Users under-report content that does violate policy if they like it or think no one cares. Of course users who like content won’t report it, but people who disagree with content are often disinclined to report. When the user norms of the space are accepting of certain kinds of speech, users may think a form of speech is allowed even if it’s not. When there’s no automated detection of racism, for example, and suddenly users see lots of racism on a platform, they’ll think that it’s there because it’s allowed and they won’t report it when they see it.

So relying on user reports is terrible for precision. Lots of content is reported that shouldn’t be. It’s also terrible for recall, meaning that lots of content isn’t reported that should be. I’ll quote Dave again because he just puts it so perfectly: “They are actively choosing to design the system to miss a lot of garbage.”

In short, these changes to Trust & Safety policies and operations at Meta are going to have major repercussions. They normalise extreme views during a time when marginalised communities, especially trans and immigrant communities, are already being targeted and attacked. 

Unfortunately, the rollback of protections on both Meta and X/Twitter is clearly showing us that, for those who value the safety and dignity of all, sensible Trust & Safety strategies are key. As someone with deep experience in this space, I can say with certainty that these policy shifts are a step backwards, not forward, for online safety and healthy discourse. Meta has a responsibility to its users, and these changes abdicate that responsibility in the most troubling of ways. I only hope that other platforms hold onto their values and don’t follow suit.

You ask, I answer

Send me your questions — or things you need help to think through — and I'll answer them in an upcoming edition of T&S Insider, only with Everything in Moderation*

Get in touch

Also worth reading

Mark Zuckerberg lies about content moderation to Joe Rogan’s face (The Verge)
Why? I'm glad The Verge listened to this podcast and reported on it so I don't have to. Love the attention to detail and fact-checking about removing fact-checking.

2025: The year we decide the internet's future (Global Voices)
Why? A global look at what's coming in 2025, "At stake is the shift from a multi-stakeholder model — where governments, businesses, civil society, and technical communities share responsibility — to a government-dominated approach."

How the US TikTok Ban Would Actually Work (Wired)
Why? The fate of TikTok may be decided next week - here's how.