From social media cyberbullying to assault in the metaverse, the Internet can be a dangerous place. Online content moderation is one of the most important ways companies can make their platforms safer for users.
However, moderating content is no easy task. The volume of content online is staggering. Moderators must contend with everything from hate speech and terrorist propaganda to nudity and gore.
The digital world’s “data overload” is only compounded by the fact that much of the content is user-generated and can be difficult to identify and categorize.
AI to automatically detect hate speech
That’s where AI comes in. By using machine learning algorithms to identify and categorize content, companies can identify unsafe content as soon as it is created, instead of waiting hours or days for human review, thereby reducing the number of people exposed to unsafe content.
For instance, Twitter uses AI to identify and remove terrorist propaganda from its platform. AI flags over half of tweets that violate its terms of service, while CEO Parag Agrawal has made it his focus to use AI to identify hate speech and misinformation. That said, more needs to be done, as toxicity still runs rampant on the platform.
Similarly, Facebook’s AI detects nearly 90% of hate speech removed by the platform, including nudity, violence, and other potentially offensive content. However, like Twitter, Facebook still has a long way to go.
Where AI goes wrong
Despite its promise, AI-based content moderation faces many challenges. One is that these systems often mistakenly flag safe content as unsafe, which can have serious consequences. For example, Facebook marked legitimate news articles about the coronavirus as spam at the outset of the pandemic. It mistakenly banned a Republican Party Facebook page for more than two months. And, it flagged posts and comments about the Plymouth Hoe, a public landmark in England, as offensive.
However, the problem is tricky. Failing to flag content can have even more dangerous effects. The shooters in both the El Paso and Gilroy shootings published their violent intentions on 8chan and Instagram before going on their rampages. Robert Bowers, the accused perpetrator of the massacre at a synagogue in Pittsburgh, was active on Gab, a Twitter-esque site used by white supremacists. Misinformation about the war in Ukraine has received millions of views and likes across Facebook, Twitter, YouTube and TikTok.
Another issue is that many AI-based moderation systems exhibit racial biases that need to be addressed in order to create a safe and usable environment for everyone.
Improving AI for moderation
To fix these issues, AI moderation systems need higher quality training data. Today, many companies outsource the data to train their AI systems to low-skill, poorly trained call centers in third-world countries. These labelers lack the language skills and cultural context to make accurate moderation decisions. For example, unless you’re familiar with U.S. politics, you likely won’t know what a message mentioning “Jan 6” or “Rudy and Hunter” refers to, despite their importance for content moderation. If you’re not a native English speaker, you’ll likely over-index on profane terms, even when they’re used in a positive context, mistakenly flagging references to the Plymouth Hoe or “she’s such a bad bitch” as offensive.
One company solving this challenge is Surge AI, a data labeling platform designed for training AI in the nuances of language. It was founded by a team of engineers and researchers who built the trust and safety platforms at Facebook, YouTube and Twitter.
For example, Facebook has faced many issues with gathering high-quality data to train its moderation systems in important languages. Despite the size of the company and its scope as a worldwide communications platform, it barely had enough content to train and maintain a model for standard Arabic, much less dozens of dialects. The company’s lack of a comprehensive list of toxic slurs in the languages spoken in Afghanistan meant it could be missing many violating posts. It lacked an Assamese hate speech model, even though employees flagged hate speech as a major risk in Assam, due to the increasing violence against ethnic groups there. These are issues Surge AI helps solve, through its focus on languages as well as toxicity and profanity datasets.
In short, with larger, higher-quality datasets, social media platforms can train more accurate content moderation algorithms to detect harmful content, which helps keep them safe and free from abuse. Just as large datasets have fueled today’s state-of-the-art language generation models, like OpenAI’s GPT-3, they can also fuel better AI for moderation. With enough data, machine learning models can learn to detect toxicity with greater accuracy, and without the biases found in lower-quality datasets.
AI-assisted content moderation isn’t a perfect solution, but it’s a valuable tool that can help companies keep their platforms safe and free from harm. With the increasing use of AI, we can hope for a future where the online world is a safer place for all.