LIVE
Accepting Q2 engagements · 3 audit slots
14 yrs · 200+ systems shipped · USD-billed

The Pre-Launch AI Audit: Why 23 Critical Vulnerabilities Should Never Reach Production

There’s a moment every team building with AI eventually faces: the demo works, the stakeholders are happy, and the launch date is on the calendar. The assistant answers questions, handles requests, and sounds great. The pressure now is to ship.

That’s exactly the moment to stop and ask a different question not “does it work?” but “what happens when someone tries to break it?” Because a customer-facing AI system isn’t just software that can have bugs. It’s software that takes instructions in plain language, often touches real customer data, and frequently has the authority to take actions. When it fails, it doesn’t throw a clean error. It does something it was never supposed to do, confidently, in front of a user.

This is the story of one pre-launch red-team engagement where asking that question early caught 23 critical issues before a single one reached a real customer.

Results at a glance

Critical / high-severity vulnerabilities found23
Reached production0
Stage caughtPre-launch (before public release)
Headline risk categoriesPrompt injection · Sensitive-data exposure · Guardrail bypass
Reference frameworkOWASP Top 10 for LLM Applications (2025)

Engagement details anonymized at the client’s request. Figures from the Gigaflop engagement record. [[EDITOR: confirm exact figures + naming/anonymization approval per CS-REDTEAM]]

The context

The client was a SaaS company preparing to launch an AI assistant inside their product the kind of conversational feature that has quietly become table stakes. It could answer customer questions, pull information from the user’s account, and trigger a handful of in-app actions on request. It was built well by a capable team. It had been tested the way software is normally tested: does it do what we asked it to do?

What it hadn’t had was an adversarial review anyone deliberately trying to make it do what it shouldn’t. The team’s instinct, a good one, was to get that review done before launch rather than after. They brought us in for a pre-launch red-team.

The challenge

The gap here is subtle, and it’s the reason so many AI features ship with latent risk. Functional testing confirms the system behaves correctly for cooperative users. A red-team assumes the user is not cooperative that someone will paste strange instructions into the chat box, probe for other people’s data, or try to talk the assistant out of its own rules.

Two things make this higher-stakes for AI than for ordinary software:

  • AI systems take instructions and data through the same channel. A language model can’t always tell the difference between content it’s supposed to process and an instruction hidden inside that content. This is why prompt injection sits at the top of the OWASP Top 10 for LLM Applications and has held that position across editions (OWASP, 2025).
  • The cost of catching it late is brutal. The long-standing IBM Systems Sciences Institute finding is that a defect caught in design costs a small fraction of one caught after release on the order of up to 100× cheaper than fixing it once it’s live (IBM Systems Sciences Institute). For a security flaw in a customer-facing feature, the “cost” isn’t only engineering hours; it’s exposed data, eroded trust, and incident response.

The challenge, in one sentence: prove how this assistant could be abused before real users had the chance to find out for us.

Our approach

We ran the engagement as a structured adversarial audit, organized against the OWASP Top 10 for LLM Applications (2025) the most widely referenced framework for LLM-specific risk rather than ad-hoc poking. Working from a recognized taxonomy is what separates a red-team from a vibe check: it makes coverage measurable and findings reproducible.

The audit moved through four layers:

  1. Map the attack surface. Every place untrusted input enters the system, every data source the assistant can read, and every action it’s authorized to take. You can’t test authority you haven’t mapped.
  2. Probe each OWASP category systematically. Prompt injection and the model’s ability to be steered off its instructions; whether sensitive information could be surfaced to the wrong user; whether guardrails could be bypassed; whether the assistant had more authority (excessive agency) than the task required; and the newer 2025 concerns system-prompt leakage and weaknesses in the retrieval/RAG layer.
  3. Chain findings. Real risk often isn’t a single flaw; it’s two minor ones combined. We tested combinations, not just isolated cases.
  4. Rate by impact, then hand over a remediation path. Each finding got a severity, a plain-English explanation of the business consequence, and a concrete fix not just a “this is bad” flag.

(Throughout this piece we describe what the categories are and why they matter deliberately not how to execute them. The point of a red-team is to close doors, not publish a map of them.)

The results

Across the engagement, the audit surfaced 23 critical and high-severity vulnerabilities, none of which had been visible through standard functional testing. The three headline categories:

  • Prompt injection (OWASP LLM01). The assistant could be steered by crafted input into ignoring its own instructions the foundational LLM weakness, and the one with the widest blast radius.
  • Sensitive-information disclosure. Under the right conditions, the assistant could be induced to surface information it should never have exposed to that user the failure mode with the clearest compliance and trust consequences.
  • Guardrail bypass (jailbreak-class). The safety boundaries the team had put in place could, in specific cases, be talked around meaning the system would produce outputs it was explicitly designed to refuse.

The full set of 23 spanned several OWASP categories beyond these three, including excessive agency (the assistant being able to do more than the task required) and weaknesses in how retrieved content was handled.

Category breakdownFindings (representative — confirm against report)
Prompt injectionseveral [[confirm count]]
Sensitive-information disclosureseveral [[confirm count]]
Guardrail / safety bypassseveral [[confirm count]]
Excessive agency & output handlingremainder [[confirm count]]
Total critical / high23

[[EDITOR: the per-category split above is illustrative of where the 23 fell; replace with exact counts from the engagement report. The total (23) and the three headline categories are confirmed.]]

Before vs after:

At planned launch (before audit)After audit + remediation
Known adversarial coverageFunctional testing onlyFull OWASP LLM Top 10 sweep
Critical/high vulnerabilities23 latent, undetected23 identified and remediated
Reached productionWould have0
Cost profile of fixesPost-incident (highest)Pre-launch (lowest)

What made it work

Three things turned this from a checklist exercise into a launch the team could stand behind:

  • A framework, not improvisation. Anchoring to the OWASP LLM Top 10 made the coverage defensible. The team could see exactly what was tested and what wasn’t which matters when a board or a customer asks “how do you know it’s safe?”
  • Findings written for two audiences. Each issue had an engineer-level fix and a one-line business consequence. Security findings that only engineers understand don’t get prioritized; findings a CTO can explain to a CEO do.
  • Timing. Every one of these was cheaper, faster, and quieter to fix before launch. There was no incident, no disclosure, no customer email because there were no customers yet. That is the entire argument for a pre-launch audit.

Could this work for you?

If you’re shipping anything customer-facing that takes natural-language input, reads user data, or can take actions, the pattern in this story applies to you — regardless of how well-built the system is. The 23 findings here weren’t the result of a careless team. They were the result of a capable team being tested adversarially for the first time. That’s the norm, not the exception.

A useful gut check before any AI launch:

  • Has anyone deliberately tried to break this, not just confirm it works?
  • Do we know every data source it can read and every action it can take?
  • If a user pasted hostile instructions into it right now, what’s the worst case and have we tested it?
  • Are our guardrails enforced in auditable code, or do they live only in the prompt? (OWASP’s 2025 guidance is blunt that a system prompt is not a security control.)

If any of those is a shrug, the issues are already there. They just haven’t been found yet and the only question is whether you find them, or a user does.

Conclusion

A pre-launch AI audit isn’t a tax on shipping. It’s the difference between finding 23 critical vulnerabilities in a quiet engagement and finding them in a production incident report. The technology is new; the principle is old. Catch it early, pay a fraction of the cost, and launch something you can defend. Catch it late, and you pay the maximum price in the most public way.

Twenty-three issues. Zero in production. That’s what “before” buys you.


FAQs

A pre-launch AI audit is an adversarial security review of an AI system before it ships. Rather than confirming the system works for cooperative users, it deliberately tries to make the system misbehave surfacing issues like prompt injection, data exposure, and guardrail bypass while they’re still cheap and quiet to fix.

QA confirms the system does what it’s supposed to for normal use. A red-team assumes a hostile user and tries to make it do what it shouldn’t. The two are complementary, QA validates intended behavior; red-teaming finds the unintended behavior that functional tests never trigger.

Prompt injection is when crafted input causes a language model to follow instructions it shouldn’t, because the model can’t reliably separate instructions from content. It sits at the top of the OWASP Top 10 for LLM Applications (2025) because it underlies many of the most serious AI failures, from data leakage to unauthorized actions.

It depends on the system’s complexity how many data sources it touches and how much it’s authorized to do. A focused assistant is a shorter engagement than an agent with broad tool access. The scoping conversation establishes the timeline; the goal is always to finish before launch, not after.

No. Any team shipping a customer-facing AI feature carries this risk, and smaller teams often have more exposure because security review hasn’t been formalized yet. The depth scales to the system; the need doesn’t depend on company size.

We document what each vulnerability is, its business impact, and exactly how to fix it not a how-to for attacking it. The deliverable is a remediation path, not an attack manual. The goal is to close the doors, not catalog how to open them.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top