Text your AI before bed. Read the answers at breakfast.
An async relay for local ollama. Queue tasks from your phone, your GPU processes them overnight, results wait for you in the morning. Zero cloud. Nothing leaves your network.
coming soon — get notified
The problem
You run ollama on your home GPU. Great. But you don't leave it on 24/7 — electricity costs money and GPUs run hot. So when you think of something at 11pm, you either walk to your desk or forget it by morning.
Open WebUI, Reins, ngrok — they all need the server running when you send the message. If your box is off, your message goes nowhere.
The difference: an async queue
11pm [phone] ──text──▶ [relay on your box — always on, tiny]
3am (GPU box is off. message waits.)
7am [GPU wakes on schedule] ──▶ [CLI polls relay] ──▶ [ollama]
7:01 [ollama] ──reply──▶ [relay] ◀──check── [phone over coffee]
The relay is a 120-line Node server that runs on anything — a Raspberry Pi, an old laptop, a $5 VPS on your LAN. It holds messages until your GPU box wakes up and the CLI processes them. Your GPU doesn't need to be on when you think. It needs to be on when it works.
What people actually do with this
- Overnight article summaries — save 5 links before bed, wake up to a digest written by your local model.
- Morning briefings — "summarize my RSS feeds" queued at midnight, ready at 7am.
- Async code review — paste a diff from your phone, read the analysis before standup.
- Background research — "find me 3 papers on X" sent from the train, processed while you commute.
- Journaling prompts — text a thought, your agent reflects on it overnight, you read the reflection in the morning.
What's in the box
- relay-server.js — ~120 lines. Holds messages in a queue. Runs on anything with Node.
- ollama-relay.js — ~150 lines. Polls the queue, calls ollama, posts results back. Rolling conversation history.
- Systemd units — copy-paste ready. Relay always on, CLI wakes with your GPU.
- Android shortcut recipe — one tap from your home screen to queue a task.
- Cron examples — schedule your GPU to wake, process, and sleep automatically.
Who it's for
- People who turn off their GPU at night but still want to send it work.
- Self-hosters who refuse to route through anyone else's server.
- Anyone building async local AI workflows — overnight batch jobs, scheduled digests, background agents.
- Tinkerers who want 400 lines they can read in 10 minutes, not a framework.
Get notified when it ships
I'll send one email when the v0.1.0 release is out. No newsletter. No marketing. One email, then nothing.
◇ $5 — full source, CLI + server + systemd units + README
◇ source-available — read every line, can't redistribute
Honest limitations
- Home-LAN first. If you expose the relay to the internet, put it behind a reverse proxy with TLS.
- In-memory queue by default. Relay restart = pending messages gone. SQLite persistence is a 20-line patch.
- Auth is a single shared key. Good enough for your LAN.
- Not a chat UI. It's async plumbing — send a task, come back later for the result.
- Your GPU still needs to turn on eventually. This queues the work, it doesn't do the work.