Speech-to-Text (STT) — user speaks, you get text
You record audio in the browser using the MediaRecorder API, send the audio file to your backend, and your backend forwards it to Whisper. You get a text transcript back in ~1 second.
Text-to-Speech (TTS) — your app speaks
Send a text string, choose a voice (alloy, echo, fable, onyx, nova, shimmer), and get an MP3 audio file back. Play it directly in the browser.
ElevenLabs offers 100+ pre-built voices with emotional range, and a free tier generous enough for demos. More natural than OpenAI TTS for longer passages.
OpenAI DALL-E 3 — best quality, easiest to use
You send a prompt and optionally specify size (1024×1024, 1792×1024, 1024×1792) and quality (standard / hd). You get back a URL to the generated image — valid for 1 hour. Save it to your storage if you need it longer.
Alternatives — more control, lower cost
DALL-E 3 has better prompt understanding and photorealism. Replicate is cheaper per image and supports hundreds of specialised styles — illustrative, anime, logo design, interior design, etc.
Start with DALL-E 3 for simplicity. Move to Replicate if you need lower cost at scale or specialised artistic styles.
OpenAI GPT-4o — the industry standard
The API takes an array of messages (with roles: system, user, assistant). The system message defines the AI's persona and task. Your user's input goes in a user message. The AI responds as assistant.
The system message shapes everything. It's where you give the AI its role, rules, tone, and domain. Examples:
Anthropic Claude — great for analysis & long documents
Google Gemini — free tier, multimodal, and Google AI Studio
Gemini 2.0 Flash is Google's fastest and most cost-effective model. Unlike GPT-4o and Claude, the free tier is usable for real student projects — 15 requests per minute and 1 million tokens per day at no cost. It is also natively multimodal: you can send text, images, PDFs, and audio in the same request.
Before writing a single line of code, visit aistudio.google.com. It's a free browser tool where you can test prompts, experiment with multimodal inputs, and — crucially — click "Get code" to export a ready-to-run JavaScript or Python snippet for exactly what you just tested. Think of it as a live scratchpad that hands you working code.
🖼️ Multimodal — analyse an image with text
Gemini's standout feature is native multimodal support. You can send an image (uploaded by the user, a screenshot, a product photo) alongside a text question — all in one API call, with no separate vision endpoint needed.
Receipt scanner → extract total and items. Product photo → suggest description and price. Sketch/wireframe → describe the UI to a developer. Business card → extract contact info. Screenshot of a competitor app → list features and gaps.
Which model to choose — GPT vs Claude vs Gemini
Widest ecosystem, best function-calling support, most community examples. GPT-4o-mini is very cheap. Use for: general chat, tool use, code generation, image understanding.
Best for long documents (200k context), careful structured output, and reasoning tasks. Tends to be more cautious and less prone to hallucination. Use for: analysis, summarisation, document Q&A.
Best free tier (no billing needed), native multimodal (text + image in one call), and Google AI Studio for instant prototyping. Use for: student prototypes, image analysis, and any project where budget is zero.
Useful AI prompt patterns for MVPs
Drop these into the system field of any LLM call and customise for your product:
A database is persistent storage — it keeps data even when the server restarts or the user closes the browser. You need one when your app must:
For a one-user demo that only runs in one browser, localStorage may be enough. For anything real: use a database.
Supabase — recommended for most student projects
Firebase / Firestore — Google's real-time database
Firestore stores data as documents inside collections. Think of a collection like a folder, and a document like a JSON file inside it. There's no fixed schema — each document can have different fields.
Firebase is NoSQL (flexible documents, great for real-time). Supabase is SQL (structured tables, great for relational data with joins). For most startup prototypes, Supabase is easier to start with. Choose Firebase if real-time collaboration is a core feature.
Airtable — spreadsheet as a database
Airtable is a spreadsheet where each sheet is a table and each row is a record. You can interact with it using a simple REST API — no database client or SDK required. Perfect for quick prototypes, content management, and admin-friendly data entry.
| Feature | 🟢 Supabase | 🔥 Firebase | 📊 Airtable |
|---|---|---|---|
| Data model | SQL tables (rows + columns) | NoSQL documents (flexible JSON) | Spreadsheet (rows + named columns) |
| Free tier | 500 MB, 50k rows | 1 GB storage, 50k reads/day | 1,000 records per base |
| Auth built-in | ✓ Email, OAuth, magic link | ✓ Email, Google, Apple, SMS | ✗ Not included |
| Real-time updates | Partial (via Realtime channels) | ✓ Native (onSnapshot) | ✗ Polling only |
| SQL queries | ✓ Full SQL support | ✗ No SQL | ✗ No SQL |
| Non-technical admin UI | Partial (table editor) | Partial (console) | ✓ Excellent spreadsheet UI |
| File storage | ✓ Built-in (images, PDFs) | ✓ Firebase Storage | ✓ Attachments in records |
| JS SDK quality | ✓ Excellent | ✓ Excellent (Google-maintained) | REST API only (no official SDK) |
| Best for | Structured data, auth, SQL joins | Real-time collab, mobile apps | Content management, no-code CMS |
| Learning curve | Low (SQL is widely known) | Medium (NoSQL concepts) | Very low (like a spreadsheet) |
Quick decision guide
Your app has users, posts, or orders with relationships between them (e.g. a user has many posts). You want login/signup included. You're comfortable thinking in rows and columns. Default choice for most MVPs.
Real-time updates are a core feature (live chat, collaborative editing, live dashboards). You're building a mobile app. Your data structure changes often and you don't want to run database migrations.
Non-technical teammates need to add or edit content without any code. You're building a content-driven app (product catalogue, event listings, job board) where the "database" acts as a CMS. You want to be up and running in 5 minutes with zero backend setup.
Asking AI to connect your database — prompt templates
You don't need to write database code from scratch. Give Claude the prompt above inside your Project — it generates the complete code, including the Vercel serverless functions. Your job is to understand which database to use and what data to store. Push everything to GitHub and Vercel handles the rest.