Cluster Analysis of AI Incident Journalism

This blog post is a contributed by Sherry, Steven, an Shyam, students at The George Washington University School of Business who analyzed the AI Incident Database and its relationship with journalism as part of their capstone coursework.

The evolution of journalism in reporting AI incidents—events involving alleged harms or near harms caused by AI systems—has received little analytical attention. One of the primary ways the public finds out about AI incidents is through news reports. Yet news outlets are not free from political biases, selective neutrality, potential conflicts of interest, and other issues such as considerations of what readers are actually interested in that could impact AI incident reporting. Given the centrality, and imperfection, of news and journalism as methods of reporting AI incidents, it’s important to consider questions such as how journalists and news organizations handle AI incident reporting, the timeline of media interest in AI incidents, the trends that have emerged in AI incident reporting, and how various forces such as bias may have shaped the coverage of AI incidents.

Our brief examination offers some preliminary exploration of these questions by analyzing data from the AI Incident Database (AIID). Using a methodology consisting of:

BERT term embeddings to convert incident reports into numeric representations,
latent Dirichlet allocation topic modeling to assign subjects to reports,
density-based clustering to group reports, and
UMAP dimension reduction for visualization

we analyzed macro trends in AI journalism based on the available data gathered within the AIID.

Assumptions and limitations for our study follow those of the AIID itself, such as incomplete counts of AI incidents or a lack of cost and incident response information. We have also relied on standard open-source Python packages, which may or may not contain imperfections unknown to us. Reported results have not been peer-reviewed or replicated by independent researchers. Code for our analysis is available on GitHub: https://github.com/AIID-Trend-Analysis-Project/AIID-Trend-Analysis/.

Clusters in AI Incident Reports

Let’s first take a look at some of the macro trends within AI incident reporting from 1996 to 2024.

Figure 1. Cluster visualization of all incident reports.

Figure 1 is a visualization of all incidents documented by the AIID since 1996, where each point in the plot represents an individual incident report. Based on Figure 1, we assigned the labels of “Targeted Incidents” and “Broad Incidents” to the X axis, where incidents on the left appear more targeted in nature and those on the right being broader. For the Y axis, we assign the labels “Digital Incident” and “Physical Incident,” where incidents occurring toward the bottom of the plot tend to occur in the digital world and those appearing closer to the top tend to impact the physical world.

Figure 2. Cluster visualization for reports in 2015 and 2016.

Figure 2. Cluster visualization for reports in 2015 and 2016.

Figure 2 displays clustered incident reports from 2015 and 2016. Reporting of AI incidents in these earlier years of the dataset is rather sparse, with the main reporting topic being racism and sexism arising from popular search algorithms. Aside from algorithmic bias, we see an elevated level of reporting within the robotics and self-driving car space, forming some of the most prominent clusters during this period.

Figure 3. Cluster visualization for reports in 2015-2018.

Figure 3. Cluster visualization for reports in 2015-2018.

During the 2015-2018 period, incidents involving robots and self-driving cars were among the most prominently reported cases, forming their own distinct clusters. As we’ll see below, reports on self-driving vehicles continued to grow rapidly after 2018, but coverage of robot-related incidents significantly declined, suggesting a shift in journalistic focus.

Figure 4. Cluster visualization for the 2017-2021.

Figure 4. Cluster visualization for the 2017-2021.

How did journalistic attention shift throughout 2017-2021? Figure 4 suggests a trend toward reporting on broader social impact issues, likely due to the wider adoption of machine learning algorithms. As these technologies became more prevalent in companies and governments, their impact on daily life increased. By 2019, the focus had shifted away from robotic mishaps, which affected relatively few people, to AI incidents with broader societal implications, such as predictive policing algorithms.

Figure 5. Cluster visualization of all reports up to January 2022.

Reports up to January 2022 provide a snapshot of AI incident reporting prior to the seismic shift in the AI space caused by ChatGPT and other consumer generative AI tools. The data in Figure 5 indicate that there are significantly more reports in the bottom right quadrant, compared to the bottom left, with a significant number of incidents being reported on algorithmic biases in education, law, and healthcare. Other observations include self-driving cars remaining the most prominent cluster, while incidents surrounding bots, deepfakes, and AI moderation in social media are beginning to separate from the central cluster.

Figure 6. Cluster visualization for reports in 2022 and 2023.

Figure 6. Cluster visualization for reports in 2022 and 2023.

2022 saw a reduced intake of AI incident reports in the AIID, but this reflects a limitation in data collection rather than a true decline in incidents. The output appears to have shifted in 2023 with the large increase in reports related to ChatGPT, which was released in November 2022. This surge in attention extended beyond ChatGPT to other large language models (LLMs) and generative AI applications such as deepfakes and AI image generation, all of which indicate a broader trend toward generative AI incident reporting.

Conclusion

Media reporting on AI incidents appears to have begun in earnest around 2015, shifting from physical, narrowly targeted incidents to digital incidents with broader impacts over the following years. As more companies and organizations adopted AI and machine learning as a matter of routine, media attention on these incidents grew in volume. However, the total number of incidents cannot be extrapolated from the AIID; it provides a detailed snapshot, but not the full picture. We observed a strong trend toward reporting on generative AI incidents in recent years, and a clear cluster of self-driving car incidents throughout all analyzed years.