Even a Code-Illiterate Built It! Home Server Journey (4) — Running AI Locally with Ollama

Toaster — Mon, 09 Mar 2026 03:15:56 +0000

Running AI on My Own Server?

ChatGPT, Gemini, Claude… everyone uses cloud AI. But have you ever thought:

“If I run AI on my own computer, it’s free AND my data stays private?”

That’s exactly right. Running a local LLM (Large Language Model) means no subscription fees and zero data leaving your machine. Perfect privacy.

But reality is… a bit different. I installed AI on my SER9 MAX mini PC from Episode 1, and the honest verdict? “It works. But it’s slow.”

Photo by Matheus Bertelli / Pexels

Ollama — The Local LLM Engine

Ollama is a tool that lets you run AI models on your own hardware. Sounds complicated? I had AI install it for me. A few terminal commands and done.

Once installed, one command — ollama run qwen3:14b — and the AI starts responding. The model downloads automatically, no configuration needed.

There are dozens of open-source models available: Llama, Qwen, Gemma, Mistral, DeepSeek… all free. Pick whichever fits your needs.

Open WebUI — ChatGPT Interface in Your Browser

Chatting in a terminal is honestly uncomfortable. So I installed Open WebUI — a program that gives you the exact same ChatGPT-like interface, running entirely on your server.

Again, AI handled the installation. One Docker container and it’s running.

The best part? My wife uses it too. Anyone on the same network can open a browser on their phone or tablet and start chatting. You can create separate accounts, so conversation history stays private for each person. With Tailscale from Episode 2, it’s accessible from anywhere.

Photo by Matheus Bertelli / Pexels

Specs vs. Reality — This Is What Matters

The most important question in local AI is “Can my hardware actually handle it?” Here are my real-world numbers.

My Server Specs

Component	Specification
CPU	AMD Ryzen 7 255 (8 cores, 16 threads)
RAM	DDR5 32GB
GPU	Integrated (AMD Radeon 780M) — effectively none
Storage	NVMe SSD 1TB
OS	Windows 11 + WSL2 (Linux)

Real Benchmarks (Qwen3 14B Model)

Metric	Value
Generation Speed	5.5 tokens/sec
Simple Question Response	~25 seconds
RAM Usage	~10GB
Quantization	Q4_K_M (9.3GB file)

What ChatGPT answers in 1 second takes my server 25 seconds. That’s roughly 5-10x slower in real usage. Watching characters appear one by one is… a patience test.

Why So Slow?

No dedicated GPU. AI inference is optimized for GPU computing, but my mini PC only has integrated graphics. I’ve confirmed that the AMD 780M iGPU can’t be used for AI acceleration under WSL2. Everything runs on CPU only — hence the speed.

With an NVIDIA GPU? The same model runs 5-10x faster. An RTX 4060 can push 30+ tokens/second. But you can’t put a discrete GPU in a mini PC — that’s desktop or gaming laptop territory.

RAM Determines Model Size

The most important spec for local AI is RAM. The entire model loads into memory.

RAM	Model Size	Quality
8GB	7B (7 billion parameters)	Basic chat OK, struggles with complexity
16GB	14B (14 billion parameters)	Decent conversation, handles general tasks
32GB	14B + headroom / can try 30B	Comfortable 14B + other services running
64GB+	70B (70 billion parameters)	Approaching ChatGPT quality

7B vs 14B vs 70B — bigger means better. 7B handles simple chat but frequently hallucinates on complex questions. 14B is the minimum threshold where it feels “actually usable.” 70B jumps in quality but needs 40GB+ RAM.

That’s why I have 32GB. Running a 14B model while also keeping other Docker services (Immich, WordPress, n8n, etc.) alive requires the headroom.

Photo by Andrey Matveev / Pexels

So Is It Worth It?

Here’s my honest summary:

Worth it for:

Simple conversations, translation, summarization — slow but delivers results
Privacy-sensitive content — analyzing confidential work documents
Offline use — on a plane, in areas with no internet
Connecting AI to other apps — unlimited API calls, zero cost

Not worth it for:

Coding, complex analysis — cloud AI is overwhelmingly better
When you need fast responses — if you can’t wait 25 seconds
When you need current information — local models don’t know anything after their training date

The core value of local AI is “free” and “privacy.” If you’re expecting performance, you’ll be disappointed. But if those two things matter to you, it’s absolutely worthwhile.

Next Episode Preview

So far we’ve covered building the server, remote access, photo backup, and local AI. Next up is the piece that ties everything together — an AI agent and Telegram bot. Send a message on Telegram, and AI handles the rest. Building your own digital assistant.

EP.5 — AI Agent + Telegram: Putting a Secretary on Your Server. Stay tuned.

The post Even a Code-Illiterate Built It! Home Server Journey (4) — Running AI Locally with Ollama appeared first on Prsm Studio.

LLM Archives - Prsm Studio