Skip to main content

Agent Architecture

How Vibe's AI agent processes tasks and makes decisions.

Execution Modes

Vibe supports two execution patterns:

Plan-Execute-Reflect (Default)

Best for complex, multi-step tasks.

┌──────────┐     ┌──────────┐     ┌───────────┐
│ PLAN │────>│ EXECUTE │────>│ REFLECT │
└──────────┘ └──────────┘ └───────────┘
▲ │
│ │
└──────────── feedback ───────────┘
  1. Plan: Analyze task, identify steps
  2. Execute: Run tools, interact with browser
  3. Reflect: Verify result, retry if needed

Plan-Execute (Simple Mode)

Lower overhead for straightforward tasks.

┌──────────┐     ┌──────────┐
│ PLAN │────>│ EXECUTE │────> Done
└──────────┘ └──────────┘

LangGraph State Machine

The agent uses LangGraph to manage state:

                    ┌─────────────┐
│ START │
└──────┬──────┘

┌──────▼──────┐
│ ASSISTANT │ (LLM decides action)
└──────┬──────┘

┌────────────┴────────────┐
│ │
┌──────▼──────┐ ┌──────▼──────┐
│ TOOLS │ │ DONE │
└──────┬──────┘ └─────────────┘

┌──────▼──────┐
│ REFLECTION │ (verify success)
└──────┬──────┘

└────────────> back to ASSISTANT

Tool Invocation

When the LLM calls a tool:

  1. Parse tool call from LLM response
  2. Validate arguments against schema
  3. Execute tool in browser context
  4. Return result to LLM for next decision
// Example tool call from LLM
{
"tool_calls": [{
"name": "click_by_index",
"args": { "index": 5 }
}]
}

// Tool execution
const result = await clickByIndex({ index: 5 });
// Returns: "Clicked element [5] 'Add to Cart' button"

Reflection System

After task completion, reflection validates the result:

// Reflection prompt
"Did the agent successfully complete the task?
Task: Find the price of iPhone 15
Last action: Extracted text '$999.00'

Respond with:
- COMPLETE if task is done
- RETRY with feedback if more work needed"

If reflection returns RETRY, the agent receives feedback and continues.

Subagent Architecture

For parallel tasks, the main agent spawns subagents:

┌─────────────────────────────────────────────────┐
│ MAIN AGENT │
│ │
│ Task: "Compare prices on Amazon and Best Buy" │
│ │
│ ┌─────────────┴─────────────┐ │
│ │ │ │
│ ┌─────▼─────┐ ┌──────▼────┐ │
│ │ SUBAGENT 1│ │ SUBAGENT 2│ │
│ │ (Amazon) │ │ (Best Buy)│ │
│ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │
│ └──────────┬───────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ MERGE RESULTS │ │
│ └───────────────┘ │
└─────────────────────────────────────────────────┘

Subagent Properties

ResourceSharing
LLMShared (same API)
ToolsShared reference
StateIsolated (own context)
TabsCan create own tabs

When to Use Subagents

  • Parallel price comparisons
  • Multi-site data extraction
  • Background operations
  • Context isolation

Token Management

The agent tracks token usage to prevent context overflow:

  • Max context: Model-dependent (128K for GPT-4)
  • Page content: Truncated if too large
  • Conversation history: Summarized when approaching limit

Error Handling

Tool Error → Retry (up to 3 times) → Escalate to Reflection

┌───────▼───────┐
│ Adjust approach │
└───────────────┘

Common error patterns:

  • Element not found → Wait and retry
  • Page not loaded → Increase wait time
  • API timeout → Reduce request complexity