Amazon Discovers Hundreds of Thousands of Child Sexual Abuse Images in AI Training Data

Amazon has reported finding hundreds of thousands of pieces of suspected child sexual abuse material in data collected for training its AI models last year, though the company has provided limited information about the material’s origins to child safety officials.

Bloomberg reports that Amazon detected the illicit material throughout 2025 within its AI training data and subsequently reported it to the National Center for Missing and Exploited Children (NCMEC.) During 2025, NCMEC experienced at least a fifteen-fold increase in AI-related reports, with the vast majority originating from Amazon.
The e-commerce and cloud computing giant removed the problematic content before using the data to train its AI models. However, child safety officials have expressed concern that Amazon has not provided sufficient information about where the material came from, which could potentially hinder law enforcement efforts to locate perpetrators and protect victims.

An Amazon spokesperson stated that the training data came from external sources and that the company lacks detailed information about its origin. While other major technology companies have also scanned their training data and reported potentially exploitative material to NCMEC, the clearinghouse has identified significant differences. Other companies collectively submitted only a handful of reports and provided more comprehensive details about the material’s origin.

“We take a deliberately cautious approach to scanning foundation model training data, including data from the public web, to identify and remove known [child sexual abuse material] and protect our customers,” the spokesperson said.

The dramatic increase in Amazon’s reports comes during an intensifying AI competition that has left companies racing to acquire and process massive volumes of data to enhance their models. Amazon accounted for the majority of the more than one million AI-related reports of child sexual abuse material submitted to NCMEC in 2025. This represents a substantial jump from 67,000 AI-related reports the previous year, and just 4,700 in 2023.

The presence of illegal content in AI training data raises concerns among experts. It could potentially influence a model’s fundamental behaviors, possibly enhancing its capability to digitally manipulate and sexualize images of real children or generate entirely new images of sexualized children.

The Amazon spokesperson clarified that as of January, the company is not aware of any instances where its models have generated child sexual abuse material. An automatic detection tool flagged the content by comparing it against a database of known child abuse material, a process known as hashing. Approximately 99.97 percent of the reports came from scanning non-proprietary training data.

Amazon believes it over-reported these cases to NCMEC to ensure nothing was overlooked. “We intentionally use an over-inclusive threshold for scanning, which yields a high percentage of false positives,” the spokesperson added.

The volume of suspected material Amazon detected surprised child safety experts. The hundreds of thousands of reports represented a dramatic increase for the company, which made a total of 64,195 reports in 2024.

“This is really an outlier,” said Fallon McNulty, the executive director of NCMEC’s CyberTipline. “Having such a high volume come in throughout the year begs a lot of questions about where the data is coming from, and what safeguards have been put in place.”

McNulty explained that Amazon has provided very little to almost no information in their reports about where the illicit material originally came from, who shared it, or whether it remains actively available on the internet. While Amazon is not legally required to share this level of detail, the absence of information makes it impossible for NCMEC to trace the material’s origin. “There’s nothing then that can be done with those reports,” she said. “Our team has been really clear with [Amazon] that those reports are inactionable.”

When asked why the company did not disclose information about the possible origin of the material, the Amazon spokesperson responded that because of how the data is sourced, the company does not have the data that comprises an actionable report. “While our proactive safeguards cannot provide the same detail in NCMEC reports as consumer-facing tools, we stand by our commitment to responsible AI and will continue our work to prevent CSAM,” the spokesperson said.

“There should be more transparency on how companies are gathering and analyzing the data to train their models — and how they’re training them,” said David Thiel, the former chief technologist at the Stanford Internet Observatory, who has researched the prevalence of child sexual abuse material in AI training data.

Amazon was not the only company to spot and report potential material from its AI workflows last year. Google and OpenAI told reporters that they scan AI training data for exploitative material. Meta and Anthropic also said they search training data for such material.

McNulty said that with the exception of Amazon, the AI-related reports received last year came in very small volumes and included key details that allowed the clearinghouse to pass actionable information to law enforcement. “Simply flagging that you came across something but not providing any type of actionable detail doesn’t help the larger child safety space,” McNulty said.