We Run a Real AI App on DeepSeek for Pennies: The Honest Stack, Costs, and Mistakes
Unlike our research roundups, this one is first-hand. We designed, shipped, and still run the app described below. The numbers are our real setup, not a press release — your mileage will vary, and model pricing changes, so confirm current rates before you budget.
Everyone’s talking about how cheap DeepSeek is. Fewer people have actually shipped a real, running product on it and watched the bill. We did. This is the honest version: what we built, the exact stack, what it costs to keep the lights on, and the mistakes that cost us the most time and money.
What we built
A small AI triage assistant: a user describes symptoms in plain language, and the assistant suggests which type of clinic/department might be relevant and surfaces related information. Crucially, it’s an information tool, not a diagnosis — more on why that distinction shaped the whole build below.
It’s live, it handles real users, and it runs on infrastructure that costs less than a couple of coffees a month. Here’s how.
The stack (and why each piece is cheap)
- LLM: DeepSeek API. Pay-per-token, and the per-token price is low enough that a full back-and-forth triage conversation costs a fraction of a cent. This is the part everyone fixates on — and honestly, it’s the least of your costs.
- Backend: Python + Flask + gunicorn. Boring, stable, free. One process, one small server.
- Storage: SQLite (WAL mode). No managed database, no Redis, no extra monthly bill. For rate-limiting, quotas, and audit logs, a single file is genuinely enough until you’re at serious scale.
- Frontend: a lightweight H5 web app. Built once, served as static files. No app-store gatekeeping, no native build farm.
- Server: a basic cloud VPS. A few dollars a month buys plenty for an app like this.
- TLS: Let’s Encrypt. Free certificates, auto-renewed. We never paid for SSL.
- No Docker, no Kubernetes. systemd services + nginx + a couple of bash scripts. We deliberately refused the heavyweight stack, and we’ve never regretted it.
The whole philosophy: buy compute by the token, not by the always-on GPU. The model is rented per request; everything else is a small fixed cost you fully control.
What it actually costs
Let’s be honest about the shape of the bill rather than pretend a precise figure applies to you:
- The VPS — a few dollars a month, fixed.
- TLS — free.
- The AI — effectively pennies, scaling with usage, not a flat subscription. Because it’s per-token, a quiet week costs almost nothing.
The counterintuitive lesson: at small scale, the AI is not your main expense — your own time is. Which leads to the part that actually matters.
Five lessons that cost us the most
1. The model is cheap; the guardrails are the work. Wiring up DeepSeek took an afternoon. Making it behave responsibly — sensible prompts, refusing to overstep, consistent output structure — took far longer. Budget your hours, not just your tokens.
2. Never let an AI gate an emergency. Our single most important design rule: the assistant flags potential red-flag situations and tells the user to seek real help immediately — it never blocks, never replaces a professional, and every output carries a plain “this is information only, not advice.” If you build anything in a health, legal, or financial space, this isn’t optional; it’s the whole ballgame.
3. Use token discipline and caching to keep costs near zero. Trim system prompts, cap context, and lean on the provider’s prompt-cache pricing where you repeat the same instructions. The difference between a careless and a careful integration is large in percentage terms — even if both are cheap in absolute dollars.
4. SQLite + simple rate-limiting beats premature scaling. We resisted the urge to add Redis, a managed DB, and a container orchestrator “for when we’re big.” We aren’t big yet, and the simple stack has never been the bottleneck. Add complexity when a real limit forces you to, not before.
5. Ship AI output as a draft, not a verdict. Every suggestion is framed as “here’s a starting point to check,” never “here’s the answer.” It sets correct user expectations and dramatically lowers your risk surface.
Who this is for (and an honest caveat)
If you’re a solo builder wondering whether you can ship a real AI product without burning cash on GPUs or a fragile micro-services stack — yes, you can, and the bill is smaller than you think. The catch isn’t cost; it’s the unglamorous work of prompts, guardrails, and responsible framing.
Pricing and model capabilities move fast. Treat our numbers as a snapshot, check DeepSeek’s current pricing before you commit, and remember the real lesson under all of it: rent intelligence by the token, keep everything else boring and cheap, and spend your saved time on safety and UX.
Want the next build log? We’ll break down the prompt + guardrail layer in detail — the part that actually took the time.