Running AI on My Own Server?
ChatGPT, Gemini, Claude… everyone uses cloud AI. But have you ever thought:
“If I run AI on my own computer, it’s free AND my data stays private?”
That’s exactly right. Running a local LLM (Large Language Model) means no subscription fees and zero data leaving your machine. Perfect privacy.
But reality is… a bit different. I installed AI on my SER9 MAX mini PC from Episode 1, and the honest verdict? “It works. But it’s slow.”

Ollama — The Local LLM Engine
Ollama is a tool that lets you run AI models on your own hardware. Sounds complicated? I had AI install it for me. A few terminal commands and done.
Once installed, one command — ollama run qwen3:14b — and the AI starts responding. The model downloads automatically, no configuration needed.
There are dozens of open-source models available: Llama, Qwen, Gemma, Mistral, DeepSeek… all free. Pick whichever fits your needs.
Open WebUI — ChatGPT Interface in Your Browser
Chatting in a terminal is honestly uncomfortable. So I installed Open WebUI — a program that gives you the exact same ChatGPT-like interface, running entirely on your server.
Again, AI handled the installation. One Docker container and it’s running.
The best part? My wife uses it too. Anyone on the same network can open a browser on their phone or tablet and start chatting. You can create separate accounts, so conversation history stays private for each person. With Tailscale from Episode 2, it’s accessible from anywhere.

Specs vs. Reality — This Is What Matters
The most important question in local AI is “Can my hardware actually handle it?” Here are my real-world numbers.
My Server Specs
| Component | Specification |
|---|---|
| CPU | AMD Ryzen 7 255 (8 cores, 16 threads) |
| RAM | DDR5 32GB |
| GPU | Integrated (AMD Radeon 780M) — effectively none |
| Storage | NVMe SSD 1TB |
| OS | Windows 11 + WSL2 (Linux) |
Real Benchmarks (Qwen3 14B Model)
| Metric | Value |
|---|---|
| Generation Speed | 5.5 tokens/sec |
| Simple Question Response | ~25 seconds |
| RAM Usage | ~10GB |
| Quantization | Q4_K_M (9.3GB file) |
What ChatGPT answers in 1 second takes my server 25 seconds. That’s roughly 5-10x slower in real usage. Watching characters appear one by one is… a patience test.
Why So Slow?
No dedicated GPU. AI inference is optimized for GPU computing, but my mini PC only has integrated graphics. I’ve confirmed that the AMD 780M iGPU can’t be used for AI acceleration under WSL2. Everything runs on CPU only — hence the speed.
With an NVIDIA GPU? The same model runs 5-10x faster. An RTX 4060 can push 30+ tokens/second. But you can’t put a discrete GPU in a mini PC — that’s desktop or gaming laptop territory.
RAM Determines Model Size
The most important spec for local AI is RAM. The entire model loads into memory.
| RAM | Model Size | Quality |
|---|---|---|
| 8GB | 7B (7 billion parameters) | Basic chat OK, struggles with complexity |
| 16GB | 14B (14 billion parameters) | Decent conversation, handles general tasks |
| 32GB | 14B + headroom / can try 30B | Comfortable 14B + other services running |
| 64GB+ | 70B (70 billion parameters) | Approaching ChatGPT quality |
7B vs 14B vs 70B — bigger means better. 7B handles simple chat but frequently hallucinates on complex questions. 14B is the minimum threshold where it feels “actually usable.” 70B jumps in quality but needs 40GB+ RAM.
That’s why I have 32GB. Running a 14B model while also keeping other Docker services (Immich, WordPress, n8n, etc.) alive requires the headroom.

So Is It Worth It?
Here’s my honest summary:
Worth it for:
- Simple conversations, translation, summarization — slow but delivers results
- Privacy-sensitive content — analyzing confidential work documents
- Offline use — on a plane, in areas with no internet
- Connecting AI to other apps — unlimited API calls, zero cost
Not worth it for:
- Coding, complex analysis — cloud AI is overwhelmingly better
- When you need fast responses — if you can’t wait 25 seconds
- When you need current information — local models don’t know anything after their training date
The core value of local AI is “free” and “privacy.” If you’re expecting performance, you’ll be disappointed. But if those two things matter to you, it’s absolutely worthwhile.
Next Episode Preview
So far we’ve covered building the server, remote access, photo backup, and local AI. Next up is the piece that ties everything together — an AI agent and Telegram bot. Send a message on Telegram, and AI handles the rest. Building your own digital assistant.
EP.5 — AI Agent + Telegram: Putting a Secretary on Your Server. Stay tuned.


