Skip to main content

Ollama Setup Guide

Ollama is the easiest way to run AI models locally. It's a single binary — no Docker, no Python, no GPU drivers needed (though it uses your GPU automatically if available).

Step 1: Install Ollama

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

brew install ollama

Or download from ollama.com/download.

curl -fsSL https://ollama.com/install.sh | sh

Ollama installs as a systemd service and starts automatically.

winget install Ollama.Ollama

Or download the installer from ollama.com/download.

Step 2: Pull a Model

# Recommended: best quality-to-size ratio
ollama pull qwen3.5

# Or a lighter model for machines with less RAM
ollama pull qwen3:4b

# Or the smallest model for testing
ollama pull smollm2:1.7b

Available Models

Browse all models at ollama.com/library. Popular choices:

ModelDownload SizeRAM NeededDescription
qwen3.56.6 GB8–10 GBDefault recommendation — strong reasoning and tool use
qwen3:14b9.3 GB12–16 GBBest quality for machines with 16 GB+ RAM
llama3.1:8b4.9 GB8 GBMeta's general-purpose model
deepseek-r1:8b5.0 GB8 GBStrong reasoning and math (Q4 quantized)
gemma3:4b3.3 GB6 GBGoogle's efficient model
phi4-mini:3.8b2.5 GB4–6 GBMicrosoft's compact model
smollm2:1.7b1.0 GB2–4 GBSmallest, runs on anything

RAM vs Download Size: The download size is the model file on disk. In memory, Ollama needs roughly 1.5–2× the download size due to KV cache and runtime overhead. For example, llama3.1:8b (4.9 GB on disk) needs ~8 GB of RAM/VRAM when running.

Step 3: Connect Vibe Browser

  1. Open Vibe Browser and click the gear icon ⚙️ to open Settings
  2. Click the provider dropdown (shows "Vibe GenAI Gateway" by default)
  3. Select Ollama (Self-Hosted)
  4. Vibe automatically detects Ollama on localhost:11434
  5. The model selector shows both installed models and installable models from Ollama library
  6. Default Ollama preference is qwen3.5
  7. If you pick an uninstalled model, Vibe auto-installs it via POST /api/pull
  8. Click Save Settings

That's it! Start chatting with your local model.

No API Key Needed

Ollama runs locally — no API key, no account, no payment required.

Managing Models

List installed models

ollama list

Pull a new model

ollama pull mistral:7b

Remove a model

ollama rm mistral:7b

Check what's currently loaded in memory

ollama ps

Ollama API Reference

Ollama exposes an OpenAI-compatible API at http://localhost:11434:

EndpointMethodDescription
/v1/chat/completionsPOSTChat (OpenAI-compatible)
/api/tagsGETList installed models
/api/psGETList running models
/api/pullPOSTDownload a model
/api/showPOSTModel details
/api/deleteDELETERemove a model

Model Lifecycle

Ollama automatically loads models into memory on first use and unloads them after 5 minutes of inactivity. You can control this with the keep_alive parameter:

# Keep model loaded for 30 minutes
curl http://localhost:11434/api/generate -d '{"model":"qwen3.5","prompt":"","keep_alive":"30m"}'

# Unload model immediately
curl http://localhost:11434/api/generate -d '{"model":"qwen3.5","prompt":"","keep_alive":0}'

Troubleshooting

Ollama not detected by Vibe

  1. Is Ollama running? Check with:

    curl http://localhost:11434
    # Should return: "Ollama is running"
  2. Start Ollama manually:

    ollama serve
  3. CORS issues? Set the origins environment variable:

    OLLAMA_ORIGINS="*" ollama serve

Model runs slowly

  • Check RAM usage: Models need 1.5–2× their download size in RAM (e.g., a 5 GB model file needs ~8 GB of RAM at runtime due to KV cache and overhead)
  • Use a smaller model: Try qwen3:4b or smollm2:1.7b
  • Close other apps: Free up RAM for the model
  • Use quantized models: Lower quantization (Q4) uses less RAM than F16

Model gives poor results

  • Use a larger model: qwen3.5 is significantly better than smollm2:1.7b
  • Increase context length: Some tasks need more context
  • Try a different model family: Different models excel at different tasks

Auto-Start Ollama

brew services start ollama

Ollama installs as a systemd service and starts automatically:

sudo systemctl enable ollama
sudo systemctl start ollama

Ollama starts automatically after installation. You can find it in the system tray.