Agent Architecture
How Vibe's AI agent processes tasks and makes decisions.
Execution Modes
Vibe supports two execution patterns:
Plan-Execute-Reflect (Default)
Best for complex, multi-step tasks.
┌──────────┐ ┌──────────┐ ┌───────────┐
│ PLAN │────>│ EXECUTE │────>│ REFLECT │
└──────────┘ └──────────┘ └───────────┘
▲ │
│ │
└──────────── feedback ───────────┘
- Plan: Analyze task, identify steps
- Execute: Run tools, interact with browser
- Reflect: Verify result, retry if needed
Plan-Execute (Simple Mode)
Lower overhead for straightforward tasks.
┌──────────┐ ┌──────────┐
│ PLAN │────>│ EXECUTE │────> Done
└──────────┘ └──────────┘
LangGraph State Machine
The agent uses LangGraph to manage state:
┌─────────────┐
│ START │
└──────┬──────┘
│
┌──────▼──────┐
│ ASSISTANT │ (LLM decides action)
└──────┬──────┘
│
┌────────────┴────────────┐
│ │
┌──────▼──────┐ ┌──────▼──────┐
│ TOOLS │ │ DONE │
└──────┬──────┘ └─────────────┘
│
┌──────▼──────┐
│ REFLECTION │ (verify success)
└──────┬──────┘
│
└────────────> back to ASSISTANT
Tool Invocation
When the LLM calls a tool:
- Parse tool call from LLM response
- Validate arguments against schema
- Execute tool in browser context
- Return result to LLM for next decision
// Example tool call from LLM
{
"tool_calls": [{
"name": "click_by_index",
"args": { "index": 5 }
}]
}
// Tool execution
const result = await clickByIndex({ index: 5 });
// Returns: "Clicked element [5] 'Add to Cart' button"
Reflection System
After task completion, reflection validates the result:
// Reflection prompt
"Did the agent successfully complete the task?
Task: Find the price of iPhone 15
Last action: Extracted text '$999.00'
Respond with:
- COMPLETE if task is done
- RETRY with feedback if more work needed"
If reflection returns RETRY, the agent receives feedback and continues.
Subagent Architecture
For parallel tasks, the main agent spawns subagents:
┌─────────────────────────────────────────────────┐
│ MAIN AGENT │
│ │
│ Task: "Compare prices on Amazon and Best Buy" │
│ │
│ ┌─────────────┴─────────────┐ │
│ │ │ │
│ ┌─────▼─────┐ ┌──────▼────┐ │
│ │ SUBAGENT 1│ │ SUBAGENT 2│ │
│ │ (Amazon) │ │ (Best Buy)│ │
│ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │
│ └──────────┬───────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ MERGE RESULTS │ │
│ └───────────────┘ │
└─────────────────────────────────────────────────┘
Subagent Properties
| Resource | Sharing |
|---|---|
| LLM | Shared (same API) |
| Tools | Shared reference |
| State | Isolated (own context) |
| Tabs | Can create own tabs |
When to Use Subagents
- Parallel price comparisons
- Multi-site data extraction
- Background operations
- Context isolation
Token Management
The agent tracks token usage to prevent context overflow:
- Max context: Model-dependent (128K for GPT-4)
- Page content: Truncated if too large
- Conversation history: Summarized when approaching limit
Error Handling
Tool Error → Retry (up to 3 times) → Escalate to Reflection
│
┌───────▼───────┐
│ Adjust approach │
└───────────────┘
Common error patterns:
- Element not found → Wait and retry
- Page not loaded → Increase wait time
- API timeout → Reduce request complexity