We Built an AI Habit App and Used AI in Only Two Places. Here's Why.

Another first-hand build log (see the rest of the series). This one’s a little contrarian: it’s about where we chose not to use AI, and why that was the right call.

When you ship an “AI app,” there’s a gravitational pull to make everything AI. Investors want it, landing pages reward it, and it feels like falling behind to leave a screen un-sprinkled with magic. We built a habit-tracking app and felt all of that — then deliberately used AI in only two places, and kept the entire core manual. It made the product better, cheaper, and more trustworthy. Here’s the reasoning, because “where NOT to use AI” is the most underrated product skill of this era.

The default should be: no AI

Start from the opposite of the hype. A feature has to earn its AI, not have it imposed. For each screen we asked one question:

Does an LLM here remove user effort or add real judgment — or is it just decoration that adds latency, cost, and a new way to be wrong?

Most habit features failed that test instantly. Checking off “did I meditate today” is a tap. Showing a streak is arithmetic. A weekly bar chart is a GROUP BY. Putting a language model anywhere near those would add a spinner, a token cost, and a chance of being confidently wrong — in exchange for nothing. Manual won, easily, almost everywhere.

The two places AI actually earned its spot

AI survived the test in exactly two spots, and they share a trait: both involve turning something fuzzy into something useful — the one thing LLMs are genuinely better at than code.

Breaking a vague goal into doable steps. “Get fit” or “write more” is where people stall. A model that turns a fuzzy intention into a short, concrete, starter-sized checklist does real cognitive work a switch statement can’t. That’s judgment, not decoration — it earns the call.
Summarizing a period into a human reflection. Raw stats (“4/7 days”) are cheap; a short, encouraging, honest read of the week — what went well, where it slipped, one suggestion — is the kind of synthesis people actually read. The data is mechanical; the narrative is where a model adds warmth a chart can’t.

Two features. Everything else — logging, reminders, stats, streaks, calendars — stayed boring, instant, and offline-friendly.

Why restraint made the product better, not just cheaper

This wasn’t only about the bill (though it helps — see our cost lessons). Restraint improved the actual product:

Speed and trust. The core loop — open app, tap done — is instant and works offline. No spinner, no “thinking…”, no failure state for the thing people do every single day. Reliability is the feature in a habit app.
No social pressure, no gimmicks. Our users specifically wanted a calm, no-nonsense tracker. Stuffing AI into every corner would have betrayed exactly the audience we were building for. Knowing who it’s not for is product design.
A smaller surface to break. Every AI call is something that can be slow, cost money, or hallucinate. Two calls is two things to monitor and harden. Twenty would be twenty. For a solo maker, attention is the budget, and restraint protects it.

How to decide for your own app

A simple rubric we now use before adding AI anywhere:

Effort test — does it remove steps for the user? (Best reason to add it.)
Judgment test — does it turn something fuzzy into something useful that code can’t? (Second-best reason.)
Failure test — if the model is slow, down, or wrong here, how bad is it? (If “bad,” you need fallbacks, or don’t do it.)
Decoration test — be honest: is this AI for the user, or for the pitch? (If the pitch, cut it.)

If a feature only passes the decoration test, it’s not a feature — it’s a liability with a marketing budget.

The takeaway

The skill everyone’s racing to demonstrate is “I added AI.” The skill that actually ships good products is knowing where not to. We built an AI app whose best parts are mostly not AI — and the two places we did use it are better because they’re not competing with twenty gimmicks for the user’s trust. Restraint isn’t falling behind. On a small team, it’s the whole game.

More from our build logs: the cheap stack, prompt + guardrails, the $0 backend, AI planner constraints, and voice → structured data.