Vibe Coding Is Creating a Security Debt Crisis — Grant Stellmacher

Andrej Karpathy posted the tweet that named the era in February 2025: "There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible to build an app with no technical background."

The response was ecstatic. Two million views. Thousands of founders celebrating the democratization of software. Non-technical builders discovering they could ship products they'd imagined for years but couldn't execute.

Karpathy added a caveat that most people skipped: he wasn't recommending vibe coding for production systems. He was describing a new mode of human-computer interaction, with implicit limits. The internet kept the first part and discarded the second.

Thirteen months later, litellm — one of the most widely used AI infrastructure libraries in production — disclosed a supply chain attack that exploited exactly the security posture that vibe coding normalizes. The attack vector was not novel. The scale of exposure was.

What Happened at litellm

In March 2026, security researchers disclosed that a malicious package on PyPI had been impersonating litellm — the Python library used by thousands of production AI applications to route requests between OpenAI, Anthropic, Google, and other model providers.

The attack used typosquatting (a subtly misspelled package name) and dependency confusion (exploiting how Python resolves package dependencies in certain configurations) to get malicious code installed alongside legitimate litellm instances. The malicious code exfiltrated API keys — OpenAI API keys, Anthropic API keys, internal authentication tokens — from the host environment.

The exposure window was approximately six weeks before discovery. Affected organizations ranged from startups to mid-size enterprises. The attack was not sophisticated. It was standard supply chain exploitation applied to a high-value target in an environment where security review is systematically absent.

Why was security review absent? Because the organizations most exposed to litellm supply chain risk are the ones that adopted AI infrastructure through vibe coding — companies where an engineer (or a non-engineer) added pip install litellm to a requirements file because a Claude-generated tutorial said to, without running a dependency audit, without pinning versions, without reviewing what they were actually adding to their production environment.

The Vibe Coding Security Model (Such As It Is)

Traditional software development has a security model, however imperfect. Engineers are trained — through formal education, code review culture, security tooling, and painful experience — to think about attack surface when writing code. Not perfectly, not always, but systematically enough that common vulnerability classes are caught before production.

The model breaks down at several points: time pressure, skill gaps, inadequate tooling, insufficient review. But the underlying expectation — that the engineer writing the code thinks about what an attacker would do with it — is at least present.

Vibe coding has a fundamentally different model. The developer's interaction with the code is mediated through natural language. "Build me a user authentication system" becomes a Claude prompt, and the returned code is tested against "does this work?" rather than "is this secure?" When the login works, the feature ships.

The AI systems generating this code are not reliably security-aware. Studies from Stanford and UC Berkeley in 2025 found that GitHub Copilot generates code with security vulnerabilities at rates comparable to average human developers — which is itself a damning benchmark. More specifically, AI code generators routinely produce:

SQL injection vulnerabilities when constructing database queries without parameterization
Insecure direct object reference patterns when generating API endpoints
Hardcoded credentials in configuration examples that get copied to production
Weak session management implementations in authentication flows
Missing authorization checks in CRUD operations

None of this is surprising. AI systems generate code based on patterns in training data. Training data contains a lot of insecure code — Stack Overflow examples, tutorial repos, legacy codebases. The output reflects the distribution.

The problem is not that AI generates insecure code at a higher rate than human developers. The problem is that AI-assisted development removes the review and testing steps that catch those vulnerabilities before production.

The Scale of the Exposure

GitHub's 2025 State of the Octoverse report estimated that AI-assisted coding contributed to 40% of new commits on the platform. That figure has almost certainly grown. If we assume 40% of production code deployed in 2025-2026 was written with meaningful AI assistance, and if AI-generated code has vulnerability rates comparable to unreviewed human code, the implication is significant: we have added years of security debt to the global codebase in a compressed timeframe.

Security debt is the accumulation of vulnerabilities that exist in production systems but haven't been exploited yet. Like financial debt, it compounds. A SQL injection vulnerability introduced in 2024 may not be exploited until 2027 — but the exposure accrues from the moment it ships.

The litellm attack is one data point. It won't be the last. The conditions that made it possible — AI-generated code in production without security review, dependency management handled by AI prompts rather than security tooling, API credentials exposed in environments without proper secret management — are not limited to that codebase. They describe the production environment of thousands of AI-native startups.

Who Is Liable When AI Code Fails?

This is where the legal and financial analysis gets genuinely unsettled.

Software liability in the United States has historically been limited by end-user license agreements that disclaim warranties and limit damages to the purchase price of the software. These disclaimers have been largely upheld because software development is inherently complex and difficult to make defect-free.

The vibe coding context complicates this framework in several ways:

Developer liability. If a developer deploys code they did not write, do not understand, and cannot audit — but presents to customers as production software — what is the standard of care? Courts have not extensively addressed AI-generated code failures, but the common law negligence framework applies: did the developer exercise the care that a reasonable person with their skills and knowledge would exercise?

A non-technical founder who ships AI-generated authentication code without security review is arguably not exercising reasonable care when they know (or should know) that AI-generated code requires review for security vulnerabilities. The "I didn't know" defense is increasingly unavailable as the security community's warnings about AI code quality become widely documented.

AI tool provider liability. Can a developer sue OpenAI or Anthropic if their model generates insecure code that leads to a breach? The current terms of service for every major AI provider explicitly disclaim liability for outputs used in production. Courts will eventually test whether these disclaimers are enforceable, but for now, the legal framework leaves the developer holding the bag.

Customer liability. A startup that ships a vibe-coded product and experiences a breach exposing customer data has liability exposure to those customers under state breach notification laws, CCPA (in California), and a patchwork of sector-specific regulations. The fact that the breach resulted from AI-generated code is not a defense.

The Insurance Gap

Cybersecurity insurance is the risk transfer mechanism that's supposed to bridge the gap between security incidents and existential financial exposure. It is not working the way it needs to for AI-generated code environments.

Most cyber insurance policies underwrite risk based on an assessment of security controls: Does the organization have MFA? Endpoint detection? Patch management? Incident response procedures? These questionnaires have been refined over a decade to assess the security posture of organizations running conventional software development practices.

None of them adequately assess the security posture of a vibe-coded codebase. There's no checkbox for "Did you have a security engineer review your AI-generated authentication code?" There's no question about the source of the code in production. Insurers are underwriting AI-heavy organizations using questionnaires designed for organizations with traditional development practices.

The mismatch is systemic. When claims start rolling in from AI-generated code failures — and they will — insurers will respond by adding exclusions, raising premiums, and tightening underwriting requirements for organizations where AI generates a significant portion of production code. Organizations that thought they were covered will discover they have gaps.

This happened with ransomware. In 2019, cyber insurance policies broadly covered ransomware losses. By 2022, policies had extensive ransomware exclusions, sublimits, and specific security control requirements before coverage was available. The industry responded to concentrated losses by repricing and restricting coverage.

The AI code liability event is coming. Insurers are not ready. Neither are their policyholders.

The Compliance Framework Doesn't Know Vibe Coding Exists

SOC 2 Type II certification is the enterprise SaaS standard for demonstrating security controls to customers. The common criteria include change management controls — procedures for testing, reviewing, and approving code before it goes to production.

Most SOC 2 frameworks require that code changes be reviewed by someone other than the author before deployment. In a vibe coding environment, the "author" of the code is an AI system. Does a human reading AI-generated code and verifying it "works" constitute an adequate review?

The answer from most SOC 2 auditors is currently yes — if a human reviewed and approved the change, the change management control is satisfied. This is technically compliant with the control as written and substantively hollow as a security measure, because the human reviewing code they didn't write, don't fully understand, and are evaluating against functional rather than security criteria is not providing meaningful security review.

SOC 2 will need to evolve. ISO 27001. FedRAMP. PCI-DSS. Every compliance framework that includes code review controls needs to address what "review" means for AI-generated code, and what security requirements apply to organizations where AI generates most of the production codebase.

None of them have done this yet. The vibe coding revolution is approximately 14 months old. Compliance frameworks operate on 3-5 year revision cycles. There is a gap.

What Security-Aware Organizations Are Actually Doing

The organizations doing this well are not avoiding AI code generation. They're adding the security layer that vibe coding removes.

Static analysis on everything, automated. Tools like Semgrep, Snyk, and Checkmarx can scan AI-generated code for common vulnerability patterns before it reaches production. Running these tools in CI/CD pipelines — not as optional checks but as required gates — catches a significant portion of the vulnerability surface.

Dependency pinning and auditing. The litellm attack was enabled by unpinned dependencies. Organizations that pin their dependencies to specific hashes and run automated dependency audits (pip-audit, npm audit) before deployment are structurally protected against most supply chain attacks. This is basic hygiene that many vibe-coded projects skip.

Secret scanning. AI-generated code commonly includes placeholder credentials that get replaced by real credentials and accidentally committed. GitHub's secret scanning, GitGuardian, and similar tools catch credential exposure in repositories before it becomes an incident. Running these on every commit costs almost nothing and prevents a significant class of exposure.

Security-aware prompting. Simply adding security requirements to AI prompts — "generate this authentication system, explicitly using parameterized queries, with input validation, and without hardcoded credentials" — materially improves output quality. AI systems can generate secure code; they require explicit instruction to do so consistently.

Architectural separation. Security-sensitive components — authentication, authorization, payment processing, data encryption — should be built with human security expertise, not vibe coding. Using well-audited libraries (Passport.js, Auth0, Stripe) for these components and reserving AI assistance for lower-risk business logic is a principled approach to risk management.

Karpathy's vibe coding observation was descriptive, not prescriptive. The builders who understood that built remarkable things with AI assistance while maintaining the security review layer that professional software development requires. The ones who missed it shipped faster and will pay for it later.

The security debt is real. The litellm attack was a preview. The question is whether the industry absorbs the lesson before the compounding interest comes due.