Vibe coding security
AI-generated code has 2.74x more security flaws than human code. Here is the research, and what it means for your launch.
Peer-reviewed studies from 2025 and 2026 put the exploitable vulnerability rate in AI code between 40 and 62 percent. In our Q1 2026 assessment of more than 200 vibe-coded apps, 91.5 percent had at least one hallucination flaw. The data is now hard to ignore.
For most of 2024 and the early part of 2025, the security risk of AI-generated code was an argument. Researchers had hunches. Auditors had stories. Founders had reasons to believe their own apps were the exception. The data was thin enough that reasonable people could disagree about how serious the problem was.
That window is closed now. The 2025 and 2026 research consistently shows a much higher security flaw rate in AI-generated code than in human-written code. Three numbers in particular have stabilized across multiple studies. They are not edge cases. They are the default output.
2.74 times more security flaws than human-written code
The 2.74x figure comes from a 2026 meta-analysis combining results from several peer-reviewed studies. The studies looked at functionally equivalent code written by AI tools and by experienced human developers, then ran each through static analysis and manual review. The AI code, on average, had 2.74 times as many flaws that a security reviewer would mark as exploitable.
The multiplier is not constant across categories. It is higher in security-sensitive code, like authentication handlers and database queries, and lower in business logic that does not touch user data. The takeaway: the more important the code is to the safety of your users, the worse the AI is at writing it without supervision.
Why the gap? The AI is trained on the open code corpus, which is dominated by tutorials, example projects, and Stack Overflow answers. Those sources optimize for clarity and brevity, not security. They show you the happy path. They rarely show you the input validation, the rate limit, the authorization check, the careful sanitization. The AI learned what code that solves a problem looks like. It did not learn what production-grade code looks like, because production-grade code is mostly behind closed doors.
40 to 62 percent of AI code has exploitable vulnerabilities
Different studies report different rates depending on the language, the prompt style, and the type of application. The lower bound across the studies sits around 40 percent. The upper bound reaches 62 percent. The middle is about 50.
Half. Half of the code the AI hands you, on average, has a vulnerability a security reviewer would call exploitable. Not "could theoretically be exploited under unusual conditions." Exploitable in the sense that a script kiddie with a few hours could probably use it to take user data.
The most common vulnerability categories are the ones founders rarely think about. SQL injection, even though every framework has parameterized queries built in. Missing authorization checks, where the code verifies who is logged in but not what they are allowed to see. Hardcoded secrets in the source. Race conditions in payment flows. Insufficient input validation. None of these are exotic. All of them are decades old. The AI knows the safe pattern. The AI also knows the unsafe pattern. Which one shows up in the output is essentially a coin flip.
91.5 percent of vibe-coded apps have at least one hallucination flaw
The 91.5 percent is from Kingbird's own Q1 2026 audit work. The sample was over 200 vibe-coded apps built with a mix of AI tools, brought to us by founders for either pre-launch review or post-launch incident response. The number is the percentage of those apps that had at least one hallucination flaw.
A "hallucination flaw" in security is different from a hallucination in chat. It is the AI generating code that references a function, a permission, or a security model that does not exist or does not behave the way the AI assumed. Examples we see weekly: code that calls a "verifyUser" function the AI invented; code that assumes a framework auto-checks ownership when it does not; code that uses a deprecated security library because the AI's training data is six months stale.
These flaws are particularly insidious because the code looks plausible. A human reviewer reading it for the first time has to verify each non-trivial call, because some of them refer to things that are not real or that exist but do not do what the code expects them to do.
91.5 percent had at least one. About 60 percent had three or more. Roughly 15 percent had what we would call a critical hallucination flaw, where the security model the AI assumed was different from the security model the platform actually enforces, and the gap was wide enough to expose user data.
What this means for your launch
There are two parts to the takeaway, and they are easy to mix up.
The first part: vibe coding is still a perfectly good way to build software. The flaw rates above do not mean the AI cannot ship a working application. It can. Most of the apps in the studies and our audit set are functional applications with real users.
The second part: security is a separate step that the AI does not take and the founder does not know to take. The flaw rates are the flaw rates of the raw AI output, before any review, hardening, or test. They are also the flaw rates of most vibe-coded apps in production, because most of them never had a review.
The gap between "the AI built me a working app" and "the app is safe for real users" is real, large, and trivially addressable with a few hours of review by someone who knows where to look. The reason most founders do not close the gap is not that the work is hard. It is that they do not know the gap is there.
The diagnostic question
If you have shipped an app built with AI tools in the last twelve months and have not had a security review, the relevant question is which of the three numbers above applies to your app. The 2.74x figure is statistical, so it applies in expectation. The 40 to 62 percent range applies to the code itself. The 91.5 percent is specifically about hallucination flaws in production vibe-coded apps.
The free 5-point diagnostic at Kingbird Solutions runs through the five most common exposure patterns we see in production apps and tells you which of them apply to yours. It takes about ten minutes. If we find something, the next step is either a written audit or a hardening sprint where we close the gaps and hand you a clean codebase.
The data is no longer fuzzy. The question is whether you have looked at your own app.
If this helped
You can put this thinking to work directly. Run the diagnostic on a stuck product, or book a 30-minute call to talk through your situation.