Skip to content

Give Your AI Agent a Browser: Three Approaches Compared

April 13, 2026 · Architecture · 7 min read

LLMs are powerful reasoners, but they're blind to the live web. They can't check today's stock price, fill out a form, or read a JavaScript-rendered dashboard. If your agent needs to interact with websites — not just read cached text — you need to give it a browser.

But "give it a browser" can mean very different things depending on your use case. In this post, we'll compare three approaches available on UniversalAPI, from lightest to heaviest:

  1. API-based search & scrape (Doc-Hound MCP Server)
  2. Managed headless browser (AWS AgentCore Browser)
  3. DIY Chromium on Lambda (Sparticuz/chromium)

Approach 1: API-Based Search & Scrape (Doc-Hound)

Best for: Quick research, text extraction, search results

The Doc-Hound MCP Server is a two-step research tool: search the web via SerpAPI, then scrape and extract the most relevant content using AI analysis.

1. search_web("best restaurants in Denver") → list of URLs
2. scrape_and_extract(urls, "extract restaurant names and ratings") → structured data

How it works: Under the hood, Doc-Hound calls SerpAPI for Google search results, then fetches the HTML of selected pages and uses AI to extract exactly what you asked for. No browser is involved — it's pure HTTP requests + AI parsing.

Strengths:

  • ⚡ Fast — API calls return in seconds
  • 💰 Cheap — ~1 credit per search, ~1 credit per scrape
  • 🔌 Works as an MCP server — connect from Cline, Claude Desktop, or any MCP client
  • 📊 Great for structured data extraction from static pages

Limitations:

  • ❌ Can't interact with pages (no clicking, no form filling)
  • ❌ Can't handle JavaScript-rendered content (SPAs, dynamic dashboards)
  • ❌ Can't take screenshots
  • ❌ Can't navigate multi-step workflows

When to use it: You need to search the web and extract text from pages that render server-side. Research tasks, competitive analysis, documentation lookup, price checks on static e-commerce pages.

Approach 2: Managed Headless Browser (AgentCore Browser)

Best for: Interactive pages, JS-rendered content, form automation, screenshots

AgentCore Browser gives your agent a full headless Chromium browser running in AWS's managed infrastructure. Your agent gets Playwright-powered tools to navigate, click, type, screenshot, and extract — just like a human would.

Here's the complete agent code:

python
from strands import Agent
from strands.models.bedrock import BedrockModel
from strands_tools.browser import AgentCoreBrowser

def create_agent():
    model = BedrockModel(
        model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
        region_name="us-east-1",
    )
    browser_tool = AgentCoreBrowser(region="us-east-1")

    agent = Agent(
        model=model,
        tools=[browser_tool.browser],
        system_prompt="You are a helpful assistant with web browsing capability.",
    )
    return agent, []

That's it. No browser binaries to install, no Playwright servers to manage, no Docker images to build. The browser runs in AWS's secure, serverless infrastructure with per-tenant isolation.

Strengths:

  • 🖥️ Full browser — JavaScript rendering, cookies, sessions
  • 🖱️ Interactive — click buttons, fill forms, navigate multi-step flows
  • 📸 Screenshots — capture visual state of pages
  • 🔒 Secure — session-isolated sandbox, auto-cleanup, no credential leakage
  • 🧹 Zero ops — no binaries, no layers, no Docker

Limitations:

  • 💰 More expensive than API scraping (~5 credits/minute vs ~1 credit/call)
  • ⏱️ Slower startup — browser session initialization takes a few seconds
  • 🔄 Per-turn sessions — fresh browser each agent invocation (no persistent cookies across turns)

Pricing: ~5 credits per minute of browser time (~$0.005/min), with a 2-credit minimum per session. This is based on AWS's AgentCore Browser pricing ($0.0895/vCPU-hour + $0.00945/GB-hour) with a 3× markup — still 10-30× cheaper than competitors like Browserbase ($0.05-0.15/min).

ScenarioDurationCredits
Quick page check15s2 (minimum)
Medium scrape60s5
Form submit + wait2 min11
Multi-step research10 min37

When to use it: You need to interact with live websites — fill forms, click through multi-page flows, scrape JavaScript-rendered content, or take screenshots. The agent decides what to click and where to navigate based on what it sees.

Try it now

The Browser Assistant is a pre-built agent you can try immediately — no code required. Just chat with it and ask it to browse any website.

Approach 3: DIY Chromium on Lambda (Sparticuz/chromium)

Best for: Full control, custom browser configurations, cost optimization at scale

@sparticuz/chromium is an open-source package that bundles a headless Chromium binary optimized for serverless platforms. You install it as a Lambda layer, wire up Puppeteer or Playwright yourself, and manage the entire lifecycle.

javascript
const puppeteer = require("puppeteer-core");
const chromium = require("@sparticuz/chromium");

exports.handler = async (event) => {
  const browser = await puppeteer.launch({
    args: chromium.args,
    executablePath: await chromium.executablePath(),
    headless: "shell",
  });
  const page = await browser.newPage();
  await page.goto("https://example.com");
  const title = await page.title();
  await browser.close();
  return { title };
};

Strengths:

  • 💰 Cheapest at scale — you only pay Lambda compute costs
  • 🔧 Full control — custom Chromium flags, fonts, extensions
  • 📦 Self-contained — no external service dependencies
  • 🏗️ Battle-tested — 1.6k GitHub stars, used in production by thousands

Limitations:

  • 🛠️ Significant setup — Lambda layers, memory tuning (1600MB+ recommended), cold starts
  • 📏 Size constraints — Chromium binary is ~50MB compressed, may hit Lambda limits
  • 🧑‍💻 You own the ops — browser crashes, memory leaks, version updates are your problem
  • 🤖 No AI integration — you write the navigation logic, not the LLM
  • ⏱️ Cold starts — decompressing Chromium adds seconds to first invocation

When to use it: You're building a dedicated scraping pipeline (not an AI agent), need custom browser configurations, or are running at scale where per-minute pricing doesn't make sense. This is infrastructure, not an agent tool.

Comparison Table

Doc-Hound (API)AgentCore BrowserDIY Chromium
JavaScript rendering
Click/type/interact✅ (manual)
Screenshots✅ (manual)
AI-driven navigationN/A✅ (LLM decides)❌ (you code it)
Setup effortNone (MCP connect)3 lines of PythonHours (layers, config)
Cost per page~1 credit~2-5 credits~$0.001 Lambda
Best forResearch & textInteractive browsingCustom pipelines
Ops burdenZeroZeroHigh

Which Should You Use?

Start with Doc-Hound if you just need to search and read web pages. It's fast, cheap, and works as an MCP server you can connect to any AI tool.

Upgrade to AgentCore Browser when you need the agent to interact with websites — fill forms, click through flows, handle JavaScript-rendered content, or take screenshots. It's the sweet spot of capability vs. complexity.

Go DIY with Sparticuz/chromium only if you're building a dedicated scraping pipeline at scale, need custom browser configurations, or want to avoid per-minute pricing entirely. This is an infrastructure choice, not an agent tool.

The beauty of UniversalAPI is that you can combine all three. An agent can use Doc-Hound for quick searches, AgentCore Browser for deep interactions, and store results in Knowledge for later retrieval — all in the same conversation.

Get Started

  1. Try the Browser Assistantuniversalapi.co/agents/snowtimber/browser-assistant
  2. Read the docsdocs.universalapi.co/agents/browser
  3. Connect Doc-Hounduniversalapi.co/mcp-servers/snowtimber/doc-hound
  4. Sign upuniversalapi.co (100 free credits to start)

Have questions or want to share what you've built? Join us on GitHub.

Universal API — The agentic entry point to the universe of APIs