Kingbird SolutionsKingbird Solutions

← All writing

Vibe coding security

What April's AI Coding Breaches Mean for Your Launch

In one week, three AI coding platforms leaked source code and credentials. Here is the pattern, and the check every founder should run before launch.

By Chris KingJune 2, 20267 min read

For two years the security risk of AI-built software was mostly a forecast. Researchers ran studies, auditors told war stories, and founders had every reason to assume their own app was the exception. Forecasts are easy to ignore. Named incidents are harder.

In one week this April, three separate AI coding platforms had public security failures. They were not small, and they were not obscure tools. The pattern they form is the thing founders should pay attention to, because it explains why the app you shipped last quarter may be carrying a risk you never saw. We have walked through the underlying research before. This is what it looks like when the research turns into headlines.

Three failures in one week

The largest involved Lovable, a vibe coding platform valued at 6.6 billion dollars with roughly eight million users. Security researchers disclosed that for 48 days, between early February and April 20, an account on the platform could reach source code, database credentials, AI chat history, and customer data belonging to other projects. The flaw was reported through the company's responsible disclosure program in late February and, by the company's own account of the timeline, the early reports were closed without reaching the internal security team. Lovable has pushed back on the framing of a single mass leak and points to how public projects are meant to behave, but the core fact is not in dispute: credentials and code that should have been private were reachable, and the window stayed open for weeks.

The second involved Vercel, a hosting platform many AI-built apps deploy to. Attackers reached internal systems through a third-party AI evaluation tool wired into Vercel's stack. The breach did not come through the front door. It came through a connected service, which is how most modern systems fail.

The third hit Bitwarden's command-line tool through a supply chain attack. The malware in that case was specific in a way worth noticing: it hunted for credentials to Claude, Cursor, and Codex CLI. The attackers were not after generic secrets. They were after the keys to AI coding tools, because those keys now unlock real infrastructure.

Different companies, different root causes, one week. That is the part that matters.

Why AI keeps failing the security basics

You could read three incidents as three unlucky companies. The numbers argue against that. Veracode tested more than a hundred large language models on security-sensitive coding tasks and found that about 45 percent of the code they produced introduced an OWASP Top 10 vulnerability. In that testing, 86 percent of samples failed to defend against cross-site scripting and 88 percent were open to log injection. These are not exotic, decades-from-now problems. They are the basics, and AI writes them wrong about half the time.

A newer failure mode shows how the risk compounds. Roughly one in five AI-generated code samples references a software package that does not exist. The model invents a plausible name. Attackers watch for the common inventions, register those names in public package repositories, and wait. When a developer or an automated tool installs the dependency, it runs attacker code. Researchers named this slopsquatting, and it has moved from theory to confirmed: one malicious package that models tend to suggest in place of a legitimate one was still live earlier this year and pulling hundreds of installs a week.

Independent tracking backs up the trend. A university research project that attributes published vulnerabilities to AI coding tools counted six such CVEs in January, fifteen in February, and thirty-five in March, and estimated the real number across open source is five to ten times higher. The line is climbing.

The reason is the same one we break down in the four categories of vibe-coding security gaps. AI learned to write code that solves a problem from a training set full of tutorials and example projects. Those sources optimize for clarity, not safety. The model picked up the happy path and skipped the input validation, the authorization check, and the careful handling of secrets, because production-grade code mostly lives behind closed doors where the model never saw it.

The agents widen the gap

Through 2025 the standard setup was a developer using AI as a fast assistant and reviewing the output. That buffer is thinning. Autonomous coding agents now plan, write, run, and revise across long sessions with far less human oversight at each step. The market is voting with its wallet. One agent-first company grew revenue from 37 million dollars in May 2025 to 492 million a year later, and its customers include banks, carmakers, and government units.

For founders, the speed is the appeal and the exposure at once. Slopsquatting is a clean example. The old version needed a human to read the AI's suggestion, notice an unfamiliar package, and choose to install it. An autonomous agent resolving its own dependencies removes that pause. The same logic runs across the board: every review step an agent automates is a step where a security problem can pass through untouched. More code, shipped faster, with fewer eyes on it, is a great way to build and a great way to ship a vulnerability nobody catches until a stranger does.

What this means for your launch

Two ideas sit close together here and get mixed up, so it is worth separating them.

The first: building with AI is still a good way to build. Nothing above says the tools cannot ship a working product. They can, and most of the apps in the studies are real applications with real users. If you used AI to get your idea live, that was a reasonable call.

The second: security is a distinct step the tools do not perform and most founders never think to ask for. The vulnerability rates above describe raw AI output before any review. They also describe most AI-built apps in production, because most of them never got reviewed. The distance between "the app works" and "the app is safe for real users" is real and wide. The good news is that it is usually cheap to close. A few focused hours from someone who knows where these tools fail will catch the large majority of common exposures, which is the core of a software rescue or hardening sprint. The reason founders skip the review is almost never cost. It is that nobody told them the gap was there.

The three April incidents are useful because they are concrete. A platform with eight million users left credentials reachable for 48 days. A breach traveled in through a connected tool. Malware went looking specifically for AI coding keys. When billion-dollar companies with security teams get caught by this, a solo-founder app built on the same tools deserves a look.

Common questions

Does using AI to build my app mean it is insecure? Not on its own. It means the raw output carries a high baseline rate of common vulnerabilities, and that almost no app ships secure without a deliberate review step after the build.

I am not technical. How would I even know if my app has a problem? You would not from the outside, which is the trap. The fastest starting point is a structured review against the patterns that fail most often, rather than guessing.

What is slopsquatting in plain terms? AI sometimes invents a software package name that does not exist. Attackers register that fake name and load it with malicious code, so when the package gets installed, their code runs inside your app.

My app already launched. Is it too late? No. Most of the apps we review are already in production. A hardening pass closes the gaps and hands you back a clean codebase without a rebuild.

The five-minute starting point

If you shipped something built with AI tools in the last year and have not had it reviewed, the honest question is not whether it has an exposure. It is which kind. The free five-point diagnostic walks through the most common exposure patterns we see in production AI-built apps and tells you which ones apply to yours. It takes about ten minutes. If it surfaces something, the next step is either a written audit or a hardening sprint where we close the gaps and hand back a clean codebase.

The forecast era is over. The named-incident era started this spring. The only question that matters now is whether you have looked at your own app before someone else does.

If this helped

You can put this thinking to work directly. Run the diagnostic on a stuck product, or book a 30-minute call to talk through your situation.