OpenAI’s Red Team plan: Make ChatGPT Agent an AI fortress

Share This Post

[ad_1]

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


In case you missed it, OpenAI yesterday debuted a powerful new feature for ChatGPT and with it, a host of new security risks and ramifications.

Called the “ChatGPT agent,” this new feature is an optional mode that ChatGPT paying subscribers can engage by clicking “Tools” in the prompt entry box and selecting “agent mode,” at which point, they can ask ChatGPT to log into their email and other web accounts; write and respond to emails; download, modify, and create files; and do a host of other tasks on their behalf, autonomously, much like a real person using a computer with their login credentials.

Obviously, this also requires the user to trust the ChatGPT agent not to do anything problematic or nefarious, or to leak their data and sensitive information. It also poses greater risks for a user and their employer than the regular ChatGPT, which can’t log into web accounts or modify files directly.

Keren Gu, a member of the Safety Research team at OpenAI, commented on X that “we’ve activated our strongest safeguards for ChatGPT Agent. It’s the first model we’ve classified as High capability in biology & chemistry under our Preparedness Framework. Here’s why that matters–and what we’re doing to keep it safe.”


The AI Impact Series Returns to San Francisco – August 5

The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Secure your spot now – space is limited: https://bit.ly/3GuuPLF


So how did OpenAI handle all these security issues?

The red team’s mission

Looking at OpenAI’s ChatGPT agent system card, the “read team” employed by the company to test the feature faced a challenging mission: specifically, 16 PhD security researchers who were given 40 hours to test it out.

Through systematic testing, the red team discovered seven universal exploits that could compromise the system, revealing critical vulnerabilities in how AI agents handle real-world interactions.

What followed next was extensive security testing, much of it predicated on red teaming. The Red Teaming Network submitted 110 attacks, from prompt injections to biological information extraction attempts. Sixteen exceeded internal risk thresholds. Each finding gave OpenAI engineers the insights they needed to get fixes written and deployed before launch.

The results speak for themselves in the published results in the system card. ChatGPT Agent emerged with significant security improvements, including 95% performance against visual browser irrelevant instruction attacks and robust biological and chemical safeguards.

Red teams exposed seven universal exploits

OpenAI’s Red Teaming Network was comprised 16 researchers with biosafety-relevant PhDs who topgether submitted 110 attack attempts during the testing period. Sixteen exceeded internal risk thresholds, revealing fundamental vulnerabilities in how AI agents handle real-world interactions. But the real breakthrough came from UK AISI’s unprecedented access to ChatGPT Agent’s internal reasoning chains and policy text. Admittedly that’s intelligence regular attackers would never possess.

Over four testing rounds, UK AISI forced OpenAI to execute seven universal exploits that had the potential to compromise any conversation:

Attack vectors that forced OpenAI’s hand

Attack TypeSuccess Rate (Pre-Fix)TargetImpact
Visual Browser Hidden Instructions33%Web pagesActive data exfiltration
Google Drive Connector ExploitationNot disclosedCloud documentsForced document leaks
Multi-Step Chain AttacksVariableCross-site actionsComplete session compromise
Biological Information Extraction16 submissions exceeded thresholdsDangerous knowledgePotential weaponization

FAR.AI’s assessment was openly critical of OpenAI’s approach. Despite 40 hours of testing revealing only three partial vulnerabilities, they identified that current safety mechanisms relied heavily on monitoring during reasoning and tool-use processes, which the researchers considered a potential single point of failure if compromised.

How red teaming helped turn ChatGPT vulnerabilities into a fortress

OpenAI’s response to the red team’s results redefined entire segments of the ChatGPT agent architecture. One of the many initiatives undertaken including building a dual-layer inspection architecture that monitors 100% of production traffic in real-time, achieving these measurable improvements:

Security improvements after red team discoveries

Defense MetricPrevious ModelsChatGPT AgentImprovement
Irrelevant Instructions (Visual Browser)82%95%+13%
In-Context Data Exfiltration75%78%+3%
Active Data Exfiltration58%67%+9%
System ReliabilitySampling-based100% coverageComplete monitoring

The architecture works like this:

  • First Tier: A fast classifier with 96% recall flags suspicious content
  • Second Tier: A reasoning model with 84% recall analyzes flagged interactions for actual threats

But the technical defenses tell only part of the story. OpenAI made difficult security choices that acknowledge some AI operations require significant restrictions for safe autonomous execution.

Based on the vulnerabilities discovered, OpenAI implemented the following countermeasures across their model:

  1. Watch Mode Activation: When ChatGPT Agent accesses sensitive contexts like banking or email accounts, the system freezes all activity if users navigate away. This is in direct response to data exfiltration attempts discovered during testing.
  2. Memory Features Disabled: Despite being a core functionality, memory is completely disabled at launch to prevent the incremental data leaking attacks red teamers demonstrated.
  3. Terminal Restrictions: Network access limited to GET requests only, blocking the command execution vulnerabilities researchers exploited.
  4. Rapid Remediation Protocol: A new system that patches vulnerabilities within hours of discovery—developed after red teamers showed how quickly exploits could spread.

During pre-launch testing alone, this system identified and resolved 16 critical vulnerabilities that red teamers had discovered.

A biological risk wake-up call

Red teamers revealed the potential that the ChatGPT Agent could be comprimnised and lead to greater biological risks. Sixteen experienced participants from the Red Teaming Network, each with biosafety-relevant PhDs, attempted to extract dangerous biological information. Their submissions revealed the model could synthesize published literature on modifying and creating biological threats.

In response to the red teamers’ findings, OpenAI classified ChatGPT Agent as “High capability” for biological and chemical risks, not because they found definitive evidence of weaponization potential, but as a precautionary measure based on red team findings. This triggered:

  • Always-on safety classifiers scanning 100% of traffic
  • A topical classifier achieving 96% recall for biology-related content
  • A reasoning monitor with 84% recall for weaponization content
  • A bio bug bounty program for ongoing vulnerability discovery

What red teams taught OpenAI about AI security

The 110 attack submissions revealed patterns that forced fundamental changes in OpenAI’s security philosophy. They include the following:

Persistence over power: Attackers don’t need sophisticated exploits, all they need is more time. Red teamers showed how patient, incremental attacks could eventually compromise systems.

Trust boundaries are fiction: When your AI agent can access Google Drive, browse the web, and execute code, traditional security perimeters dissolve. Red teamers exploited the gaps between these capabilities.

Monitoring isn’t optional: The discovery that sampling-based monitoring missed critical attacks led to the 100% coverage requirement.

Speed matters: Traditional patch cycles measured in weeks are worthless against prompt injection attacks that can spread instantly. The rapid remediation protocol patches vulnerabilities within hours.

OpenAI is helping to create a new security baseline for Enterprise AI

For CISOs evaluating AI deployment, the red team discoveries establish clear requirements:

  1. Quantifiable protection: ChatGPT Agent’s 95% defense rate against documented attack vectors sets the industry benchmark. The nuances of the many tests and results defined in the system card explain the context of how they accomplished this and is a must-read for anyone involved with model security.
  2. Complete visibility: 100% traffic monitoring isn’t aspirational anymore. OpenAI’s experiences illustrate why it’s mandatory given how easily red teams can hide attacks anywhere.
  3. Rapid response: Hours, not weeks, to patch discovered vulnerabilities.
  4. Enforced boundaries: Some operations (like memory access during sensitive tasks) must be disabled until proven safe.

UK AISI’s testing proved particularly instructive. All seven universal attacks they identified were patched before launch, but their privileged access to internal systems revealed vulnerabilities that would eventually be discoverable by determined adversaries.

“This is a pivotal moment for our Preparedness work,” Gu wrote on X. “Before we reached High capability, Preparedness was about analyzing capabilities and planning safeguards. Now, for Agent and future more capable models, Preparedness safeguards have become an operational requirement.”

Red teams are core to building safer, more secure AI models

The seven universal exploits discovered by researchers and the 110 attacks from OpenAI’s red team network became the crucible that forged ChatGPT Agent.

By revealing exactly how AI agents could be weaponized, red teams forced the creation of the first AI system where security isn’t just a feature. It’s the foundation.

ChatGPT Agent’s results prove red teaming’s effectiveness: blocking 95% of visual browser attacks, catching 78% of data exfiltration attempts, monitoring every single interaction.

In the accelerating AI arms race, the companies that survive and thrive will be those who see their red teams as core architects of the platform that push it to the limits of safety and security.


[ad_2]
Source link

Related Posts

User-Friendly IPTV Player for Daily Streaming Needs

In today’s fast-paced digital lifestyle, viewers want quick, smooth,...

Προγνωστικά Οβερ Σήμερα: Αγώνες με Στατιστική Υπεροχή

Τα προγνωστικά οβερ σήμερα αποτελούν βασικό εργαλείο για τους...

Private Disposable Phone Numbers for Business and Personal Use

In today’s fast-paced digital world, maintaining privacy while staying...

Receive SMS Free Anytime, Anywhere

In the modern digital landscape, phone numbers have become...

Crypto Only Casino

Crypto Only Casino Before you start playing, was opened...

Best Online Blackjack Site

Best Online Blackjack Site ...
- Advertisement -spot_img
Slot Gacor Slot777slot mahjongslot mahjongjudi bola onlinesabung ayam onlinejudi bola onlinelive casino onlineslot danaslot thailandsabung ayam onlinejudi bola onlinesitus live casino onlineslot mahjong waysbandar togel onlinejudi bolasabung ayam onlinejudi bolaSABUNG AYAM ONLINESABUNG AYAM ONLINEJUDI BOLA ONLINESABUNG AYAM ONLINEjudi bola onlineslot mahjong wayslive casino onlinejudi bola onlinejudi bola onlinesabung ayam onlinejudi bola onlinemahjong wayssabung ayam onlinesbobet88slot mahjongsabung ayam onlinesbobet mix parlayslot777judi bola onlinesabung ayam onlinesabung ayam onlinejudi bola onlinelive casino onlineslot mahjong waysjuara303juara303juara303juara303juara303juara303juara303juara303SV388Mix ParlayBLACKJACKSLOT777Sabung Ayam OnlineBandar Judi BolaAgen Sicbo Online
agen sabung ayamslot mahjong gacorsabung ayam onlinejudi bola onlinelive casino onlineslot mahjongsabung ayam onlinejudi bola onlinelive casino onlineslot mahjongslot mahjongsabung ayam onlinescatter hitamlive casino onlinemix parlaysabung ayam onlinelive casinomahjong waysmix parlaysabung ayam onlinelive casinomahjong waysmix parlaySBOBETSBOBETCASINO ONLINESBOBETSBOBET88SABUNG AYAM ONLINESBOBETagen judi bolalive casino onlinesabung ayam onlinejudi bola sbobetsabung ayam onlineSabung Ayam OnlineJudi Bola OnlineAgen Live Casino OnlineMahjong Ways 2Sabung Ayam OnlineJudi Bola OnlineAgen Live Casino OnlineMahjong Ways 2Sabung Ayam OnlineJudi Bola OnlineAgen Live Casino OnlineMahjong Ways 2slot gacorjudi bolamix parlayjudi bolasv388SABUNG AYAM ONLINELIVE CASINO ONLINEJUDI BOLAMAHJONG WAYSSLOT MAHJONGJUDI BOLA ONLINELIVE CASINO ONLINESABUNG AYAM ONLINE
SABUNG AYAM ONLINESABUNG AYAM ONLINEJUDI BOLA ONLINEJUDI BOLA ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINESABUNG AYAM ONLINEjudi bola onlinesabung ayam onlinelive casino onlinesitus toto 4djudi bola onlinejudi bola onlinesabung ayam onlinelive casino onlinejudi bola onlinemix parlaysbobet88sv388sbobet mix parlayws168sbobet88sv388sv388sbobet88sabung ayam onlinejudi bola onlinesabung ayam onlinesbobet mix parlaysabung ayam onlinejudi bola onlineslot gacorsabung ayam onlinejudi bola onlinelive casino onlineslot mahjong waysjuara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303juara303SV388Mix ParlayLive Casino OnlineSitus Slot GacorSV388SBOBET WAPBlackjackPragmatic PlaySV388Judi Bola OnlineBlackjackKakek ZeusSV388Mix ParlayAgen BlackjackSlot Gacor Onlinesabung ayam onlinejudi bola onlinesabung ayam onlinejudi bola onlinejudi bola onlinejudi bola onlinejudi bola onlinesabung ayam onlinejudi bola onlineslot mahjong wayssabung ayam onlinejudi bolaslot mahjonglive casino onlinesabung ayam onlinejudi bola onlineslot mahjong gacorsitus toto togel 4Dsabung ayam onlinesitus toto togel 4Dsitus live casinojudi bola onlinesitus slot mahjongjudi bolasabung ayam onlinesabung ayam onlinemahjong wayssabung ayam onlinejudi bolasabung ayam onlinejudi bola
judi bola onlinejudi bola onlinejudi bola onlinejudi bola onlineJUDI BOLA ONLINESBOBET88JUDI BOLA ONLINEJUDI BOLA ONLINESV388Judi Bola OnlineBlackjackKakek ZeusSV388SBOBET WAPAgen BlackjackSlot Gacor Onlinejuara303juara303juara303juara303juara303juara303juara303juara303judi bola onlinejudi bola onlinejudi bola onlinesabung ayam onlinejudi bolasabung ayam onlinesabung ayam onlinejudi bola onlinesitus live casino onlineslot mahjong wayssabung ayam onlinesitus live casinojudi bola onlinedexel
Slot Mahjong Waysslot danaslot danaslot danasabung ayam onlinesabung ayam onlineJUDI BOLA ONLINESV388Mix ParlayAgen Casino OnlineSLOT777Sabung Ayam OnlineAgen Judi BolaLive Casino Onlinesabung ayam onlinesabung ayam onlinejudi bola onlineslot mahjong wayssabung ayam onlinejudi bola onlinesitus live casino onlineagen togel onlineSabung Ayam OnlineJudi Bola OnlineSlot MahjongBandar togelSabung Ayam OnlineJudi Bola Onlinejudi bola onlinejudi bola onlinesabung ayam onlinelive casino onlineJUDI BOLA ONLINESBOBET88JUDI BOLA ONLINEmix parlaymix parlaylive casinosabung ayam onlinemix parlayslot danaslot mahjongslot mahjongjudi bolaMAHJONG WAYS 2SABUNG AYAM ONLINELIVE CASINO ONLINESABUNG AYAM ONLINESBOBETLIVE CASINO ONLINESLOT MAHJONG WAYSSABUNG AYAM ONLINEMIX PARLAYSABUNG AYAM ONLINESABUNG AYAM ONLINEWALA MERONWALA MERONSITUS SABUNG AYAMSITUS SABUNG AYAMjudi bola terpercayaSabung Ayam Onlinemix parlaySabung Ayam OnlineZeus Slot GacorSitus Judi BolaSabung Ayam Onlinesitus sabung ayamSlot MahjongSV388SBOBET88live casino onlineslot mahjong gacorSV388SBOBET88live casino onlineslot mahjong gacorSabung Ayam OnlineJudi Bola OnlineCasino OnlineMahjong Ways 2Sabung Ayam OnlineJudi Bola OnlineLive Casino OnlineMahjong Ways 2judi bolacasino onlinesv388sabung ayam onlinejudi bola onlineagen live casino onlinemahjong waysLIVE CASINOJUDI BOLA ONLINESABUNG AYAM ONLINESITUS BOLASV388LIVE CASINO ONLINESLOT QRISSABUNG AYAM ONLINEMIX PARLAYMIX PARLAYJUDI BOLA ONLINESLOT MAHJONG
Mahjong Ways 2mahjong ways 2indojawa88daftar dan login wahanabetCapWorks Official ContactAynsley Official SitedexelHarifuku Clinic Official AccessNusa Islands Bali Official PackagesTrinidad and Tobago Pilots’ Association Official About PageNusa Islands Bali Official ContactCapworks Official SiteTech With Mike First Official SiteSahabat Tiopan Official SiteOcean E Soft Official SiteCang Vu Hai Phong Official SiteThe Flat Official SiteTop Dawg Tavern Official SiteDuhoc Interlink Official SiteRatiohead Official SiteMAN Surabaya E-Learning Official SiteShaker Group Official SiteTakaKawa Shoten Official SiteBrydan Solutions Official SiteConcursos Rodin Official SiteConmou Official SiteCareer Wings Official SiteMontero Espinosa Official SiteBDF Ventura Official SiteAkura Official SiteNamulanda Technical Institute Official Sitemenu home roasted coffeetosayama academy workshopjudi bola onlineContactez le Monaco Rugby Sevens - Club Professionnel à 7Virtual Eco Museum Official Event 2025DRT Seitai Official Contacta leading company in UWB technology development