← Back to blog

150,000 Lines of Vibe Coded Elixir: The Good, The Bad, and The Ugly

John

TL;DR:

  • Good: AI is great at Elixir. It gets better as your codebase grows.
  • Bad: It defaults to defensive, imperative code. You need to be strict about what good Elixir looks like.
  • Ugly: It can’t debug concurrent test failures. It doesn’t understand that each test runs in an isolated transaction, or that processes have independent lifecycles. It spirals until you step in.
  • Bottom Line: Even with the drawbacks, the productivity gains are off the charts. I expect it will only get better.

BoothIQ is a universal badge scanner for trade shows. AI writes 100% of our code. We have 150,000 lines of vibe coded Elixir running in production. Here’s what worked and what didn’t.

The Good

Elixir is Small: It Gets It Right the First Time

Elixir is a small language. Few operators. Small standard library. Only so many ways to control flow. It hasn’t been around for decades. It hasn’t piled up paradigms like .NET or Java, where functional and OOP fight for space.

This matters. AI is bad at decisions. If you want your agent to succeed, have it make fewer decisions. With Elixir, Claude doesn’t need to pick between OOP and functional. It doesn’t need to navigate old syntax next to new patterns. There’s one way to skin the cat. Claude finds it.

This matters more if you’re adding AI to an existing codebase. In languages where paradigms came and went—often with whatever developer pushed them—Claude tries to match the existing code. The existing code is inconsistent. So Claude is inconsistent.

Elixir is Terse: Longer Sessions, Fewer Compactions

Small and terse are related but different. Small means few concepts. Terse means fewer tokens to express the same thing. Go is small but not terse—few concepts, but verbose syntax and explicit error handling everywhere. Elixir is both. We got lucky.

Context windows are a real constraint. Elixir uses fewer tokens than most languages. No braces. No semicolons. No verbose boilerplate. I can stay in a working session longer. More iterations. Fewer compactions—those moments when the AI summarizes and forgets earlier context. More context in memory.

When I built the React Native version of our app, I hit compactions constantly. JavaScript is small-ish, but it’s not terse. It burns tokens to do what Elixir does with fewer.

I also see more compactions when working on heavy HTML and Tailwind in LiveView. Adding, updating, or editing large sections of markup at once. HTML and HEEx templates are token-heavy. But even then, it’s less painful than JavaScript-heavy work.

Tidewave: Longer Unassisted Runs

Tidewave supercharges Elixir-specific context. It lets the agent read logs from the running app—debug, info, error, warning—so you don’t copy/paste logs around. It can query the dev database, see Ecto schemas, and view package documentation. Fewer hallucinations. Longer unassisted runs. The agent can check and validate its own assumptions without human intervention.

Immutability: Fewer Decisions, Less Code

If a variable gets mutated by a function call, AI now has three problems instead of one. The actual feature you want implemented. Whether to work around the mutation or update other call sites to stop mutating. And the mutated data itself—what is it, what was it, what will it be, what can it be?

AI ponders all of this and contorts itself into an overly defensive mess. It writes nonsense validation checks and if-statements on mutated data. Defensive code that wouldn’t exist in an immutable language.

In Elixir, the data is what it is. It’s not going to change. Fewer decisions. Less code.

Frontend: Higher Quality, Less Time

I prompt high-level changes—“give the top section more padding”—and Claude does it faster than I could. It’s especially good at modifying or moving large chunks of page structure. Mobile-first views? Easy. Way faster than me, and it’s a better designer than me too.

The quality floor has gone way up. You can’t hide behind “I’m not a designer” anymore.

Git Worktrees: Build Multiple Features in Parallel

I use three git worktrees, so I can work on up to three features at any given time. Typically a main feature, a slightly less important one, and the third reserved for quick fixes, low priority stuff, or quick experiments.

Three is about the limit. Any more and context switching between features becomes the bottleneck.

The Bad

AI Can’t Organize: Architecture Is Still On You

AI is exceptional at churning out lines of code. It’s significantly less exceptional at deciding where those lines should go. It defaults to creating new files everywhere. It repeats code it’s already written. It introduces inconsistencies.

This is the “mess” people describe in vibe code projects as they grow. You still need a human making structural decisions.

Trained on Imperative: It Writes Defensive Code

AI trained mostly on imperative code. Ruby, Python, JavaScript, C#. Elixir looks like Ruby. So Claude writes Ruby-style Elixir—if/then/else chains, defensive nil-checking, early returns that don’t make sense in a functional context.

Elixir wants you to be assertive. Pattern match on what you expect. Let it crash if something’s wrong. The process restarts in a good state. This is foreign to most code Claude trained on.

This gets better as the codebase grows. Claude sees more assertive patterns. It starts to infer the style. But it still defaults to defensive. I still correct it regularly. Be strict about what good Elixir looks like.

Git Operations: Keep It Out of Context

Every git operation takes context window space. Checking status. Writing commit messages. Describing PRs. That space could go to actual work. Git context goes stale fast—a commit message from 20 minutes ago is worthless after three more changes.

When I’m babysitting a feature, I commit manually. Every point I’m happy with. It’s fast. It’s cheap version control. It doesn’t burn context.

Claude Code has “checkpoints” now. Internal version control that protects vibe coders without explicit commits. That’s better than AI managing git directly.

The Ugly

OTP and Async: It Chases Ghosts

Claude is useless for debugging OTP, Task, or async issues. It doesn’t understand how processes, the actor model, and GenServers work together. When it tries to introspect the running system, it feeds itself bad data. It gets very lost.

It can course correct when you point out where it went wrong. But on its own, it chases ghosts.

Ecto Sandbox: It Chases Red Herrings

In Elixir tests, each test runs in a database transaction that rolls back at the end. Tests run async without hitting each other. No test data persists.

Claude doesn’t understand this. It uses Tidewave’s dev DB connection and thinks it’s looking at the test DB—which is always empty. A test fails. Claude queries the database. Finds nothing. Thinks there’s a data problem.

I’ve watched Claude try to seed the test database so a test will pass. That’s clearly wrong.

Other times, two tests insert or query the same schema. Claude doesn’t understand transaction isolation—tests can’t see each other’s data. It confuses itself and recommends disabling async tests altogether. Manageable once you watch for it. But ugly.

Bottom Line

AI writing all the code has been a massive win. The friction exists, but it’s manageable and doesn’t interfere much with day-to-day work. By far the most important thing: have a consistent, coherent codebase architecture. Without it, you’ll quickly end up with spaghetti code.

The goal for this year: automate myself out of a job. That means giving Claude more control over the entire software development lifecycle—from a simple problem statement to a fully tested, working PR that only needs a quick glance before it’s merged and deployed.

Want to learn more?

See how BoothIQ can transform your event lead capture and follow-up process.

Get in touch (opens in new tab)