Ollama Setup Guide

Ollama is the easiest way to run AI models locally. It's a single binary — no Docker, no Python, no GPU drivers needed (though it uses your GPU automatically if available).

Step 1: Install Ollama

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

brew install ollama

Or download from ollama.com/download.

curl -fsSL https://ollama.com/install.sh | sh

Ollama installs as a systemd service and starts automatically.

winget install Ollama.Ollama

Or download the installer from ollama.com/download.

Step 2: Pull a Model

# Recommended: best quality-to-size ratio
ollama pull qwen3.5

# Or a lighter model for machines with less RAM
ollama pull qwen3:4b

# Or the smallest model for testing
ollama pull smollm2:1.7b

Available Models

Browse all models at ollama.com/library. Popular choices:

Model	Download Size	RAM Needed	Description
`qwen3.5`	6.6 GB	8–10 GB	Default recommendation — strong reasoning and tool use
`qwen3:14b`	9.3 GB	12–16 GB	Best quality for machines with 16 GB+ RAM
`llama3.1:8b`	4.9 GB	8 GB	Meta's general-purpose model
`deepseek-r1:8b`	5.0 GB	8 GB	Strong reasoning and math (Q4 quantized)
`gemma3:4b`	3.3 GB	6 GB	Google's efficient model
`phi4-mini:3.8b`	2.5 GB	4–6 GB	Microsoft's compact model
`smollm2:1.7b`	1.0 GB	2–4 GB	Smallest, runs on anything

RAM vs Download Size: The download size is the model file on disk. In memory, Ollama needs roughly 1.5–2× the download size due to KV cache and runtime overhead. For example, llama3.1:8b (4.9 GB on disk) needs ~8 GB of RAM/VRAM when running.

Step 3: Connect Vibe Browser

Open Vibe Browser and click the gear icon ⚙️ to open Settings
Click the provider dropdown (shows "Vibe GenAI Gateway" by default)
Select Ollama (Self-Hosted)
Vibe automatically detects Ollama on localhost:11434
The model selector shows both installed models and installable models from Ollama library
Default Ollama preference is qwen3.5
If you pick an uninstalled model, Vibe auto-installs it via POST /api/pull
Click Save Settings

That's it! Start chatting with your local model.

No API Key Needed

Ollama runs locally — no API key, no account, no payment required.

Managing Models

List installed models

ollama list

Pull a new model

ollama pull mistral:7b

Remove a model

ollama rm mistral:7b

Check what's currently loaded in memory

ollama ps

Ollama API Reference

Ollama exposes an OpenAI-compatible API at http://localhost:11434:

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat (OpenAI-compatible)
`/api/tags`	GET	List installed models
`/api/ps`	GET	List running models
`/api/pull`	POST	Download a model
`/api/show`	POST	Model details
`/api/delete`	DELETE	Remove a model

Model Lifecycle

Ollama automatically loads models into memory on first use and unloads them after 5 minutes of inactivity. You can control this with the keep_alive parameter:

# Keep model loaded for 30 minutes
curl http://localhost:11434/api/generate -d '{"model":"qwen3.5","prompt":"","keep_alive":"30m"}'

# Unload model immediately
curl http://localhost:11434/api/generate -d '{"model":"qwen3.5","prompt":"","keep_alive":0}'

Troubleshooting

Ollama not detected by Vibe

Is Ollama running? Check with:

curl http://localhost:11434
# Should return: "Ollama is running"

Start Ollama manually:
```
ollama serve
```
CORS issues? Set the origins environment variable:
```
OLLAMA_ORIGINS="*" ollama serve
```

Model runs slowly

Check RAM usage: Models need 1.5–2× their download size in RAM (e.g., a 5 GB model file needs ~8 GB of RAM at runtime due to KV cache and overhead)
Use a smaller model: Try qwen3:4b or smollm2:1.7b
Close other apps: Free up RAM for the model
Use quantized models: Lower quantization (Q4) uses less RAM than F16

Model gives poor results

Use a larger model: qwen3.5 is significantly better than smollm2:1.7b
Increase context length: Some tasks need more context
Try a different model family: Different models excel at different tasks

Auto-Start Ollama

brew services start ollama

Ollama installs as a systemd service and starts automatically:

sudo systemctl enable ollama
sudo systemctl start ollama

Ollama starts automatically after installation. You can find it in the system tray.

Step 1: Install Ollama​

Step 2: Pull a Model​

Available Models​

Step 3: Connect Vibe Browser​

Managing Models​

List installed models​

Pull a new model​

Remove a model​

Check what's currently loaded in memory​

Ollama API Reference​

Model Lifecycle​

Troubleshooting​

Ollama not detected by Vibe​

Model runs slowly​

Model gives poor results​

Auto-Start Ollama​