AI Researchers: ChatGPT Is Befuddled by Donald Trump’s Political Speeches

Researchers have found that while AI chatbots like ChatGPT can detect metaphors in political speech with moderate success, they often misinterpret figurative language in Donald Trump’s speeches and lack the nuanced understanding that humans possess.

PsyPost reports that a recent study published in Frontiers in Psychology has shed light on the capabilities and limitations of AI systems when it comes to understanding metaphors in political contexts. The researchers used President Donald Trump’s speeches as a testing ground for ChatGPT-4, a large language model (LLM) trained to understand and generate human language.

The study analyzed four of Trump’s speeches from mid-2024 to early 2025, totaling over 28,000 words. These speeches were chosen for their use of Trump’s trademark style of using metaphors to frame political issues in a way that resonates with listeners.

Using a method called critical metaphor analysis, the researchers prompted ChatGPT-4 to identify potential metaphors, categorize them by theme, and explain their likely emotional or ideological impact. The model was able to detect metaphors with an accuracy rate of around 86 percent, correctly identifying 119 out of 138 sampled sentences.

However, a closer examination revealed several recurring problems in the model’s reasoning. ChatGPT-4 often confused metaphors with other forms of expression, such as similes, and tended to overanalyze simple expressions. It also struggled to correctly classify names and technical terms, treating them as metaphors instead of proper nouns.

These missteps highlight the limitations of AI in understanding meaning in context. Unlike humans, LLMs do not draw on lived experience, cultural knowledge, or emotional nuance to make sense of language, which becomes especially apparent when analyzing political rhetoric.

The study also tested ChatGPT-4’s ability to categorize metaphors based on shared themes or “source domains,” such as movement and direction or health and illness. While the model performed well with familiar, frequently used metaphor types, it was less reliable in less common or more abstract categories, such as cooking and food.

The researchers compared the AI-generated results with those produced by traditional metaphor analysis tools, finding that ChatGPT was faster and easier to use but less consistent in identifying metaphors across all categories. They also noted that the model’s performance heavily depends on how prompts are written, with even small changes in questioning affecting the output.

Furthermore, the study uncovered broader structural problems in how LLMs are trained. These models rely on enormous datasets scraped from the internet, which may lack exposure to metaphorical language in specific cultural, historical, or political contexts. As a result, LLMs may pick up and reproduce existing biases related to gender, race, or ideology.

The researchers conclude that while large language models show promise in analyzing metaphor, they are far from replacing human expertise. Their tendency to misinterpret, overreach, or miss subtleties makes them best suited for assisting researchers rather than conducting fully automated analysis.