Skip to main content

Running a model locally with Ollama: fully private AI

Time: 10:50 AM to 11:40 AM
Everything you have done this week sent data to the cloud. In this section, you will run a language model entirely on your own laptop. Nothing leaves your machine.

Install Ollama

Install Ollama on your laptop (not the Pi, your laptop has more compute):
Visit ollama.com and download the installer, or run:
curl -fsSL https://ollama.com/install.sh | sh

Pull a model

Download a small model that fits in memory:
ollama pull llama3.2:3b
This is about 2 GB. Once downloaded, it runs entirely offline.
If your laptop has 8 GB of RAM or less, use a smaller quantized model instead:
ollama pull llama3.2:3b-instruct-q4_K_M
This runs on as little as 4 GB of RAM.

Chat locally

ollama run llama3.2:3b
Type questions and get answers. Nothing leaves your machine. No internet required after the initial download.

Connect it to the robot

Update the chatbot from Day 3 to point at your local Ollama instance instead of the OpenAI API. This is a two-line code change: swap the API endpoint to localhost and update the model name.

When to use local vs cloud

Use caseRecommendation
Personal projectsLocal is fine
Medical or legal dataLocal required
Battlefield or air-gapped systemsLocal, audited, air-gapped
Cutting-edge quality neededCloud (newest models)
Cost-sensitive high volumeLocal (no per-token cost)
The capability is nearly identical. What you lose with local models is cutting-edge quality. What you gain is privacy, zero cost per token, no rate limits, and no third-party dependency.