Ollama Setup Guide
Ollama is the easiest way to run AI models locally. It's a single binary — no Docker, no Python, no GPU drivers needed (though it uses your GPU automatically if available).
Step 1: Install Ollama
import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
brew install ollama
Or download from ollama.com/download.
curl -fsSL https://ollama.com/install.sh | sh
Ollama installs as a systemd service and starts automatically.
winget install Ollama.Ollama
Or download the installer from ollama.com/download.
Step 2: Pull a Model
# Recommended: best quality-to-size ratio
ollama pull qwen3.5
# Or a lighter model for machines with less RAM
ollama pull qwen3:4b
# Or the smallest model for testing
ollama pull smollm2:1.7b
Available Models
Browse all models at ollama.com/library. Popular choices:
| Model | Download Size | RAM Needed | Description |
|---|---|---|---|
qwen3.5 | 6.6 GB | 8–10 GB | Default recommendation — strong reasoning and tool use |
qwen3:14b | 9.3 GB | 12–16 GB | Best quality for machines with 16 GB+ RAM |
llama3.1:8b | 4.9 GB | 8 GB | Meta's general-purpose model |
deepseek-r1:8b | 5.0 GB | 8 GB | Strong reasoning and math (Q4 quantized) |
gemma3:4b | 3.3 GB | 6 GB | Google's efficient model |
phi4-mini:3.8b | 2.5 GB | 4–6 GB | Microsoft's compact model |
smollm2:1.7b | 1.0 GB | 2–4 GB | Smallest, runs on anything |
RAM vs Download Size: The download size is the model file on disk. In memory, Ollama needs roughly 1.5–2× the download size due to KV cache and runtime overhead. For example,
llama3.1:8b(4.9 GB on disk) needs ~8 GB of RAM/VRAM when running.
Step 3: Connect Vibe Browser
- Open Vibe Browser and click the gear icon ⚙️ to open Settings
- Click the provider dropdown (shows "Vibe GenAI Gateway" by default)
- Select Ollama (Self-Hosted)
- Vibe automatically detects Ollama on
localhost:11434 - The model selector shows both installed models and installable models from Ollama library
- Default Ollama preference is
qwen3.5 - If you pick an uninstalled model, Vibe auto-installs it via
POST /api/pull - Click Save Settings
That's it! Start chatting with your local model.
Ollama runs locally — no API key, no account, no payment required.
Managing Models
List installed models
ollama list
Pull a new model
ollama pull mistral:7b
Remove a model
ollama rm mistral:7b
Check what's currently loaded in memory
ollama ps
Ollama API Reference
Ollama exposes an OpenAI-compatible API at http://localhost:11434:
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions | POST | Chat (OpenAI-compatible) |
/api/tags | GET | List installed models |
/api/ps | GET | List running models |
/api/pull | POST | Download a model |
/api/show | POST | Model details |
/api/delete | DELETE | Remove a model |
Model Lifecycle
Ollama automatically loads models into memory on first use and unloads them after 5 minutes of inactivity. You can control this with the keep_alive parameter:
# Keep model loaded for 30 minutes
curl http://localhost:11434/api/generate -d '{"model":"qwen3.5","prompt":"","keep_alive":"30m"}'
# Unload model immediately
curl http://localhost:11434/api/generate -d '{"model":"qwen3.5","prompt":"","keep_alive":0}'
Troubleshooting
Ollama not detected by Vibe
-
Is Ollama running? Check with:
curl http://localhost:11434
# Should return: "Ollama is running" -
Start Ollama manually:
ollama serve -
CORS issues? Set the origins environment variable:
OLLAMA_ORIGINS="*" ollama serve
Model runs slowly
- Check RAM usage: Models need 1.5–2× their download size in RAM (e.g., a 5 GB model file needs ~8 GB of RAM at runtime due to KV cache and overhead)
- Use a smaller model: Try
qwen3:4borsmollm2:1.7b - Close other apps: Free up RAM for the model
- Use quantized models: Lower quantization (Q4) uses less RAM than F16
Model gives poor results
- Use a larger model:
qwen3.5is significantly better thansmollm2:1.7b - Increase context length: Some tasks need more context
- Try a different model family: Different models excel at different tasks
Auto-Start Ollama
brew services start ollama
Ollama installs as a systemd service and starts automatically:
sudo systemctl enable ollama
sudo systemctl start ollama
Ollama starts automatically after installation. You can find it in the system tray.