Automated Content Moderation: A Primer
Abstract
Automated content moderation has become essential for large platforms to manage an
ever-growing volume of content. But the available tools are far from perfect. Their performance is
highly dependent on a set of tradeoffs made by the platforms at different stages of design and
deployment. An understanding of these tradeoffs is necessary to inform and advance the policy
debates on ensuring online safety while helping to preserve free expression and fairness in content
moderation.
This white paper aims to provide an accessible primer on the predictive models used for automated
content moderation, known as classifiers. This paper describes the lifecycle of a classifier as it plays
out within large and highly-resourced platforms, and explains the tradeoffs made at each stage of
classifier development and deployment. It also provides an overview of current technologies for
classifying different content types and the problems inherent in deploying such systems in the real
world.