Leo: My Zero-Cost, Privacy-First AI Assistant on WhatsApp
$0/month. Runs on your hardware. Lives in WhatsApp.
That's Leo. An AI assistant I built that handles queries, searches the web, manages your calendar, reads your email, tracks your fitness, and delivers a personalized briefing every morning. All from the app you're probably already using to text family and friends.
Why I Built This
WhatsApp is already on everyone's phone. It's the most popular messaging app on the planet, and I was already using it to stay connected with family and friends. The question was: what if it could also manage my digital life?
I wanted four things:
- Privacy first: My data never leaves my machine
- Control: I own the logic for workflows, system prompts, and model choice
- Zero recurring cost: No API subscriptions, no token metering
- Learn by building: A real project to deepen my understanding of local LLMs and agents
Leo is the result.
What Leo Can Do
Intelligent Conversations
Leo handles the full range of AI assistant tasks: answering questions, brainstorming, deep research, explaining concepts. Each conversation maintains its own memory via SQLite-backed sessions, so Leo remembers what you discussed earlier.
Web Search
Need current information? Leo connects to Brave Search for real-time data. Ask about news, look up facts, research any topic.
What's the latest supreme court ruling on Tariffs?
Do a deep research and summarize if Tariffs are good or bad for US economy
Google Workspace Integration
Leo becomes a productivity layer across your entire Google account:
| Service | Capabilities |
|---|---|
| Google Calendar | View events, create meetings, find free time slots |
| Gmail | Search threads, draft and send emails |
| Google Docs | Create, read, find, update documents |
| Google Drive | Search files, create folders, download content |
| Google Sheets | Read data, get ranges |
| Google Slides | Read presentations |
@leo, am I free this Sat 5pm? if so add 2 hr block for Tom's bday
Health & Fitness
Leo connects to Garmin Connect to pull your fitness data: sleep patterns, training schedule, workout history, performance trends. This feeds directly into your morning briefings.
One-Time Reminders
Use natural language:
#remindme in 30 minutes to call mom
#remindme tomorrow at 9am to check emails
#remindme at 12pm Feb 25, 2026 to complete taxes
Leo parses your request and messages you at the right time.
Recurring Reminders
Build habits:
#reminder add "9pm Sun to Thu" Review and adjust tomorrow's calendar
#reminder add "12:30 pm Thursdays" Readup Weekly Review Doc
#reminder help
#reminder list
#reminder remove <id>
Scheduled Briefings
This is the feature I use most. You define a prompt and a schedule; Leo runs it and delivers the results to your WhatsApp:
#briefing add "Morning Brief" "6:00am everyday" Get today's scheduled training from Garmin, today's calendar events, and unread emails summary
#briefing add "Evening Brief" "5:00pm everyday" Get unread emails summary and top 2 news from today
#briefing help
#briefing list
#briefing remove <id>
Wake up to a personalized digest built from your actual calendar, email, and fitness data.
Hooks: Bridge to External Programs
Leo can route messages to any program on your machine through bidirectional named pipes. Each hook creates two FIFOs: one for sending messages to the program, one for receiving responses back.
Trigger with #hook-name message or @hook-name message. I have hooks for claude and codex, so I can type #claude explain quantum computing in WhatsApp and get a Claude response routed right back into the chat.
This turns Leo into a message router that can bridge WhatsApp to virtually anything running on your machine.
Test Mode
Leo includes a local Gradio UI at http://127.0.0.1:7860 that bypasses the WhatsApp bridge entirely. It has a model selector to hot-swap Ollama models at runtime and a live system log panel. All background schedulers still run, so you can iterate on prompts without needing your phone.
Why Zero Cost Actually Works
| Component | Cost |
|---|---|
| LLM (Ollama + local model) | $0 |
| WhatsApp messaging | $0 (uses WhatsApp Web protocol) |
| Brave Search (free tier) | $0 |
| Google APIs | Free |
| Garmin data access | Free |
| Hosting | $0 (runs locally) |
| Total | $0/month |
Electricity is the only real cost. My estimates put it well under $10/year:
- The service draws ~60W at idle
- Inference spikes 100-300W for a few seconds on the 5070 Ti
- The 5060 Ti is slower but even more efficient
The key insight: modern open-source LLMs are good enough for most assistant tasks. Models like GLM-4.7-Flash, gpt-oss:20b, and deepseek-r1:8b run on consumer hardware and deliver strong results without per-token costs.
You already own the hardware. Make it work for you.
The Privacy Advantage
Leo runs entirely on your local machine:
- Local LLM: Powered by Ollama. Inference happens on your GPUs.
- Local storage: Messages, reminders, and sessions live in SQLite databases on your device.
- No cloud dependency: Your conversations never travel to external servers beyond WhatsApp.
- Your credentials, your machine: Google, WhatsApp, and Garmin tokens stay on your hardware.
Technical Architecture
The system splits into two processes that communicate over Unix domain sockets (paths configurable via INSTANCE_GUID), which allows multiple Leo instances on the same machine. The Go bridge handles the WhatsApp protocol; the Python server handles AI reasoning. Neither exposes a network port.
Go WhatsApp Bridge (whatsapp-mcp/whatsapp-bridge/)
- Built on
whatsmeow, a Go library implementing WhatsApp's multi-device protocol - Heavily modified for performance and to support all Leo use cases
- Handles authentication via QR code scanning
- Manages message storage in SQLite
- Processes media: images, videos, audio, documents
- Includes a custom Ogg Opus parser for voice message duration detection
Python Agent Server (agent/)
- Uses OpenAI Agents SDK for orchestration
- Connects to Ollama via OpenAI-compatible API (
http://localhost:11434/v1) - Agent factory with LRU cache (max 20 agents, 30-minute TTL) for multi-conversation support
- Natural language time parsing for reminders via an LLM agent
- Cron-based scheduling for briefings and recurring reminders
MCP Servers
Three servers launched as child processes using stdio-based MCP:
brave-search-mcp: Web searchworkspace-mcp: Google Workspace (Docs, Calendar, Gmail, Drive, Sheets, Slides)garmin-mcp: Fitness and health data
All three communicate with the agent server over stdin/stdout, not HTTP.
Operating Modes
- Dedicated Number Mode (
IS_DEDICATED_NUMBER=true): Responds to all DMs and group mentions. Good for a dedicated Leo phone number. - Mention Mode: Only responds when explicitly mentioned (
@leoor#leo). Works with your existing WhatsApp account.
Access Control
- Privileged whitelist (
ALLOWED_SENDERS): Only listed phone numbers get Google Workspace, Garmin, reminders, and briefings access. Non-privileged users can still chat and search the web. - Unix domain sockets for inter-process communication; no exposed network ports.
- Thread-local SQLite connections to avoid concurrency issues.
- Environment-based configuration for all sensitive credentials.
Interesting Technical Details
Natural Language Time Parsing
Instead of rigid regex patterns, the reminder system uses an LLM agent to parse times. It handles:
- "in 30 minutes"
- "tomorrow at 9am"
- "at 5pm Feb 14, 2026"
- "next Monday morning"
Cron-Based Scheduling
Briefings and recurring reminders use croniter for flexible scheduling:
| Input | Cron Expression |
|---|---|
| "9am everyday" | 0 9 * * * |
| "Monday 8am" | 0 8 * * 1 |
| "5pm friday" | 0 17 * * 5 |
WhatsApp LID Resolution
WhatsApp uses LID (Linked ID) for privacy, a format that can't be used for sending messages. The Go bridge automatically resolves LID to actual phone numbers for outbound messages.
Performance Optimizations
- Agent caching: LRU eviction prevents memory bloat from idle conversations
- Pre-built MCP parameters: Avoids per-message object creation overhead
- Shared environment copy: Avoids copying 100+ environment variables per request
- Singleton OpenAI client: Reused across all messages
Getting Started
Built on Ubuntu with NVIDIA GPUs, but it should work the same on Mac and WSL.
Prerequisites: Python >= 3.13, uv, Go, Ollama, and Node.js/npm.
- Clone the repository
- Install Ollama and pull a model:
ollama pull glm-4.7-flash - Copy
.env_exampleto.envand fill in your Brave Search API key, allowed senders, and other settings - Run the services:
./start_services.sh - Scan the QR code to connect WhatsApp
- Start messaging Leo
Want to try it without a phone? Run in test mode:
IS_TEST_MODE=true ./start_services.sh
Then open http://127.0.0.1:7860 for the Gradio UI.
What's Next
- Long-term memory: Remember preferences, recall past conversations
- Multi-modal capabilities: Image analysis, document understanding
- Voice improvements: Better TTS/STT for seamless voice conversations
- RAG on personal data: Index and search through your own documents
- Family/shared mode: Multiple users with separate contexts
The Bottom Line
- Own, don't rent: Your hardware, your model, your rules
- Privacy by design: Data never leaves your machine
- Zero marginal cost: Chat all day, run 50 briefings. Nothing extra.
- Meet users where they are: WhatsApp is already in everyone's pocket
You don't have to choose between convenience and privacy. With Leo, you get both.
Leo is open source. Your assistant, your data, your control. This post was updated on 2/28/2026