In July of last year a women's safety app called Tea ended up with about 72,000 user images sitting in a Google Firebase bucket that anyone could open, and roughly 13,000 of those were selfies and photos of driver's licenses and passports that people had handed over to prove they were real. There was no break-in. Someone posted the bucket's location on 4chan, other people clicked the link, and because the bucket had no authentication and even allowed directory listing, it took about as long to download all of it as it takes to write a short script. The files were on torrent sites within hours.
What gets lost in the retelling is that Tea's own API, the part their team actually built and reviewed, was fine. The thing that leaked was a leftover storage bucket from an old data migration that got left on a permissive default nobody circled back to. They did the hard part correctly and got destroyed by a setting. And if you are building with Cursor or Lovable or Claude Code right now, that should bother you more than the breach itself, because the setting that sank Tea is exactly the kind of thing your tools will never warn you about.
The reason is simple once you sit with it. An AI coding tool works from your code and your prompt. It does not visit your deployed app the way a stranger does, walk every route, read your bucket policies, or check what ended up in the JavaScript you shipped to the browser. It is very good at getting the demo to run, and an open bucket does not stop a demo from running.
We ran Probe across 27 recently shipped vibe-coded apps, and I want to be careful about what that does and does not prove. It was a small sample. It was taken at a particular moment. It was not a census of the whole market, and nobody should pretend that it was. But 27 out of 27 had findings, and that is the part worth sitting with, because the result was not that a few messy apps had weird edge cases. The result was that every app we looked at had something a normal user would never see and a normal attacker would absolutely try.
Across those 27 apps, Probe found 55 issues: 8 critical, 18 high, and 29 medium. The ranked list is the shape of the problem in miniature: 25 missing security headers, 13 cases of permissive CORS on sensitive endpoints, 4 exposed secret key findings across 3 apps, including a Resend key in client-side JavaScript, 4 exposed debug endpoints, 3 default admin routes, 2 unauthenticated LLM endpoints, 2 LLM endpoints with no rate-limit signal, 1 public source map, and 1 Stripe webhook accepting unsigned events.
None of that means the builders were careless. It means the tools have made one gap much smaller while leaving another gap exactly where it was. AI collapses the distance between an idea and working software, which is genuinely useful. It does not collapse the distance between working software and reviewed software, because the first is visible in the browser and the second is only visible from outside the app, from the place where headers, buckets, routes, keys, signatures, and abuse paths become real.
The checks that matter are not glamorous, but they are the ones that hurt when missed:
- secret keys in browser JavaScript, especially email, LLM, payment, and service-role keys
- real server-side auth on admin, debug, billing, webhook, and LLM routes
- CORS on sensitive endpoints that return user data or trigger paid work
- storage bucket and database policies for public read, public write, and directory listing
- rate limits and signature checks on paid APIs, Stripe webhooks, and LLM calls
The uncomfortable part is that these checks do not make the app feel more finished while you are building it. They usually do not change the demo at all, which is why they slip past the exact workflow that made the app possible in the first place. Tea did the careful part, and it still was not enough.