Red Teaming is a Critical Thinking Exercise: Part 2

May 13, 2025 · 11 min · Brian Pendleton, Abhishek Gupta, Subho Majumdar

red teaming ai vulnerabilities

Red teaming has evolved from its origins in military strategy to become a widely discussed methodology across multiple domains, including cybersecurity and AI. In Part 1, we outlined how, despite its current popularity in AI governance discussions, there appears to be a significant gap between the original intent of red teaming and its practical applications. In Part 2, we set the context for this discussion by tracing the evolution of red teaming across different fields, with a particular focus on how it has been adapted to AI systems and the limitations of red teaming in its most recent incarnation.

Military origins

Red teaming emerged from the tactical war games of the Prussian military in the early 19th century, evolving through Cold War simulations into today’s formalized methodology for challenging conventional wisdom and identifying strategic and exploitable hazards. The Prussian army adopted Kriegsspiel (literally “wargame” in German) in 1812, a tabletop simulation developed by Lieutenant Georg Leopold von Reisswitz and his son, where blue pieces represented the Prussian forces and red represented the enemy¹². This color-coding established the “red team” concept that persists to this day.

In the US, the practice of red teaming took shape during the Cold War, when the RAND Corporation conducted military simulations for the U.S. government. In these exercises, “red team” and the color red were used to represent the Soviet Union, while “blue team” and blue represented the United States³. Following the intelligence failures that led to the September 11 attacks in 2001, the U.S. Department of Defense established formal red team units to prevent similar catastrophic oversights. The 9/11 Commission identified a “failure to connect the dots” as a primary cause of the intelligence breakdown. This prompted systematic changes to prevent groupthink and foster alternative analysis⁴. As a result, the Pentagon created specialized training centers like the University of Foreign Military and Cultural Studies at Fort Leavenworth to institutionalize red teaming methodologies. The US Army’s University of Foreign Military and Cultural Studies at Fort Leavenworth, created after intelligence failures in Iraq, developed the modern framework for red teaming that transformed an ad-hoc practice into a systematic methodology for critical analysis⁵. This program teaches military officers and government officials techniques to challenge assumptions, consider alternative perspectives, and introduce contrarian thinking into planning processes.

The concept of the “10th man” from Israeli military doctrine, popularized in the film “World War Z”, illustrates another approach to institutionalized contrarian thinking. When everyone agrees on a particular outcome, it is the designated 10th man’s responsibility to disagree and explore alternative scenarios. This concept reportedly developed after intelligence failures during the 1973 Yom Kippur War, when analysts unanimously agreed that Arab troop movements weren’t a threat¹. In reality, while Israeli intelligence did establish a unit called Ipcha Mistabra (“the opposite side”) to challenge prevailing assumptions after the Yom Kippur War, the specific “10th man” concept as portrayed in the film is somewhat fictionalized but based on real adversarial thinking practices in military intelligence³.

Adoption in Cybersecurity

The National Security Agency (NSA) first recognized the need for proactive cybersecurity measures in the 1980s, and pioneered the concept of “red teams” tasked with assessing the security of classified systems⁶. These early efforts involved independent evaluators simulating potential attackers and identifying weaknesses that required remediation.

As digital threats evolved in the 1990s, so did cybersecurity red teaming. The term “tiger team” was initially used to describe specialized groups that performed many of the same functions as modern red teams⁷. These elite and highly specialized groups were hired to take on particular challenges against the security posture of organizations.

Following the 9/11 attacks, cybersecurity red teaming gained significant momentum as organizations recognized the need for more comprehensive security testing. The Central Intelligence Agency created a new “Red Cell” unit, and red teaming became increasingly common in various government agencies to model responses to asymmetric threats, including cyber attacks³. This period marked the transition from isolated penetration testing to more holistic security assessments that incorporated physical security, social engineering, and other non-technical aspects.

Modern cybersecurity red teaming encompasses several key methodologies working in concert: technical assessments that test digital defenses through vulnerability scanning, exploitation, and lateral movement⁸; physical security testing that evaluates access controls for facilities⁹; social engineering that targets the human element through phishing and impersonation¹⁰; and extended red team operations designed to achieve specific objectives while testing detection and response capabilities¹¹. These approaches are codified in frameworks such as NIST Special Publication 800-53, which includes specific controls for red team exercises designed to “simulate attempts by adversaries to compromise organizational information systems” and “provide comprehensive assessments that reflect real-world conditions”¹².

The field continues to evolve with several advanced approaches. Continuous Automated Red Teaming (CART) uses automation to assess security posture in real-time rather than through periodic manual assessments⁸. Adversary Emulation models tactics after specific threat actors that might target the organization, guided by frameworks like MITRE ATT&CK¹³. Purple Teaming fosters collaboration between red and blue teams to identify vulnerabilities and improve response strategies¹⁴. Integrated IT-OT Assessments expand scope to include industrial control systems and critical infrastructure¹⁵. AI-Enhanced Red Teaming that incorporates AI to improve the effectiveness of assessments¹⁶. And finally, specialized services from organizations like CISA provide comprehensive evaluations for critical infrastructure sectors and government agencies¹³.

This evolution of cybersecurity red teaming from isolated technical assessments to comprehensive, intelligence-driven simulations reflects the increasing sophistication of cyber threats and the growing recognition that effective security requires a holistic approach that addresses technical, physical, and human vulnerabilities.

Red Teaming AI

A working definition of the very new concept of AI red teaming is that it is structured testing to identify flaws and vulnerabilities in AI systems, typically conducted in controlled environments with developer collaboration. For LLMs specifically, red teaming is defined as “a process where participants interact with the LLM under test to help uncover incorrect or harmful behaviors”¹⁷. LLM developer companies have implemented various approaches to red teaming, ranging from comprehensive security assessments to narrower evaluations focused on specific genAI features¹⁸. Over the last two years, such approaches have emerged as “a critical practice in assessing the risks of AI models and systems”¹⁹.

Despite its growing popularity, researchers have identified significant challenges with current AI red teaming practices. There remains “a lack of consensus around the scope, structure, and assessment criteria for AI red-teaming”²⁰, raising concerns that red teaming may sometimes function more as “security theater” than substantive risk mitigation. Many current approaches focus too narrowly on the models themselves, neglecting how vulnerabilities might manifest in production systems where AI models are a part of broader systems¹⁸. Additionally, “AI experts are mostly not thinking about insider risk”²¹, and most testing processes remain limited to English-language evaluations¹⁷.

The process itself presents challenges for participants, who may experience negative psychological impacts when “required to think like adversaries and interact with harmful content, which can lead to decreased productivity or psychological harm”¹⁹. Many red-teamers also lack training in crucial disciplines outside their technical expertise, as “employee red-teamers typically have little training in any other relevant proficiencies whether linguistic, sociocultural, historical, legal, or ethical”²².

As the field matures, there’s growing recognition that “red teaming on its own is not a panacea for risk assessment”¹⁹. More effective approaches like violet teaming are emerging that integrate complementary methods, recognizing that “red teaming provides awareness of risks, while blue teaming responds with solutions”²³. Future success will likely depend on embracing greater diversity, and focusing more on practical red-teaming efforts that address “attacks that occur in practice, which are often less sophisticated than attacks present in academic papers”²⁴.

What’s Missing?

As such, the state-of-the-art of AI red teaming overindexes on model-specific behaviors rather than how these models interact with broader systems and social contexts, as well as the downstream consequences of their output. Current AI red teaming efforts tend to focus narrowly on harms rather than technical vulnerabilities while overlooking broader socio-technical considerations. The public interest dimension of red teaming remains underdeveloped²⁵.

To mitigate these flaws, AI red teams must embrace cybersecurity’s decades of experience with security testing and reporting. A lot of the testing done in the name of red teaming AI systems can be structured and automated through frameworks like MITRE ATT&CK for understanding adversarial behaviors, prioritizing vulnerabilities, and coordinating defensive responses²⁶, and established practices for continuous monitoring, automated testing, and incident response²⁷. In comparison, true red teaming requires “an alchemist mindset” that extends beyond purely technical approaches²⁸. Successful cyber red team engagements typically involve creating diverse, highly realistic scenarios producing “actionable insights for proactive remediation”²⁹—principles that are often missing from AI red teaming.

Any red team activity should be part of a larger, coordinated risk and security effort. This includes pre-mortems conducted before the development of a model and associated systems begins, and comprehensive risk assessments of the model and associated systems. Organizations should incorporate AI-specific security processes in the model development lifecycle, and include relevant security teams that can harden both the model and associated systems. Finally, a dedicated blue team should work with the red team to ensure security remains a priority throughout development, deployment, production, and retirement of the entire system. In the next blog post, we’ll dive deep into the strategies and mechanisms that can make this possible.

About the Authors

Brian Pendleton is an AI security researcher and Founding Director of ARVA. He is passionate about participatory approaches in mitigating AI security harms, and was one of the founding members of the OWASP LLM Top 10. Besides leading ARVA activities, Brian also leads community efforts in AI Village.

Abhishek Gupta was Founder and Principal Researcher of the Montreal AI Ethics Institute (MAIEI), Director for Responsible AI at the Boston Consulting Group (BCG), and a pioneering voice in the field of AI ethics. Abhishek’s research has been published in leading AI journals and presented at top-tier machine learning conferences such as NeurIPS, ICML, and IJCAI. Abhishek was also a Global Shaper with the World Economic Forum, a member of The Banff Forum, a Senior Fellow in Responsible AI at the United Nations Institute for Disarmament Research (UNIDIR), a Technical Committee Member at Accessibility Standards Canada, and more.

Subho Majumdar is a technical leader in AI ethics, security, and safety who believes in a community-centric approach to data-driven decision making. He has pioneered the use of trustworthy AI methods in multiple companies, wrote a book, and founded a number of nonprofit efforts in this area—Trustworthy ML Initiative, Bias Buccaneers, and AVID. Currently, Subho is Co-founder and Head of AI at Vijil, an AI software startup on a mission to help developers build and operate intelligent agents that people can trust.

References

The Tenth Man Rule: How to Take Devil’s Advocacy to a New Level. Meyer, 2025. ↩︎
Kriegsspiel – How a 19th Century Table-Top War Game Changed History. Kay, 2020. ↩︎
Red team. Wikipedia, 2025. ↩︎
9/11 Was A Terrible Tragedy: It Was Also The Birth Of Red Teaming. Red Team Thinking, 2021. ↩︎
The Psychology of Red Teaming with Bryce Hoffman. Dooley, 2017. ↩︎
The History of Red Team Exercises. TechRound, 2023. ↩︎
Red Team VS Blue Team: What’s The Difference? Firch, 2024. ↩︎
What is red teaming? Anderson, Holdsworth, and Kosinski, 2025. ↩︎
What is Red Teaming Cyber Security? How Does it Work? Sapphire, 2024. ↩︎
Penetration Testing vs. Red Teaming. Tomkiel, 2024. ↩︎
Red Teaming Operations. Redscan, 2024. ↩︎
CA-8(2): Red Team Exercises. CSF Tools, 2021. ↩︎
Enhancing Cyber Resilience: Insights from CISA Red Team Assessment of a US Critical Infrastructure Sector Organization. CISA, 2024. ↩︎
Penetration Testing: Understanding Red, Blue, & Purple Teams. DePalma, 2023. ↩︎
Integrated IT-OT Assessment and Governance Model for Improved Holistic Cybersecurity. Frumento, ResearchGate, 2021. ↩︎
What is red teaming? Kirvan, 2024. ↩︎
Red Teaming Contemporary AI Models: Insights from Spanish and Basque Perspectives. Romero-Arjona et al, arXiV, 2025. ↩︎
Lessons From Red Teaming 100 Generative AI Products. Bullwinkel et al, arXiV, 2025. ↩︎
OpenAI’s Approach to External Red Teaming for AI Models and Systems. Ahmad et al, arXiV, 2025. ↩︎
Red-Teaming for Generative AI: Silver Bullet or Security Theater? Feffer et al, arXiV, 2024. ↩︎
I’m Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk. Martin and Mercer, arXiV, 2025. ↩︎
AI red-teaming is a sociotechnical challenge: on values, labor, and harms. Gillespie et al, arXiV, 2025. ↩︎
The Promise and Peril of Artificial Intelligence – Violet Teaming Offers a Balanced Path Forward. Titus and Russell, arXiV, 2023. ↩︎
Attack Atlas: A Practitioner’s Perspective on Challenges and Pitfalls in Red Teaming GenAI, Rawat et al, arXiV, 2024. ↩︎
Red-Teaming in the Public Interest. Singh et al, 2025. ↩︎
MITRE ATT&CK, MITRE, 2023. ↩︎
Security and Privacy Controls for Information Systems and Organizations. NIST, 2020. ↩︎
Summon a demon and bind it: A grounded theory of LLM red teaming. Inie et al, PLOS One, 2025. ↩︎
Enhancing cybersecurity resilience through advanced red-teaming exercises and MITRE ATT&CK framework integration: A paradigm shift in cybersecurity assessment. Yulianto et al, Cyber Security and Applications, 2025. ↩︎

Red Teaming is a Critical Thinking Exercise: Part 2

Military origins#

Adoption in Cybersecurity#

Red Teaming AI#

What’s Missing?#

About the Authors#

References#