Give Your AI Agent a Browser: Three Approaches Compared
April 13, 2026 · Architecture · 7 min read
LLMs are powerful reasoners, but they're blind to the live web. They can't check today's stock price, fill out a form, or read a JavaScript-rendered dashboard. If your agent needs to interact with websites — not just read cached text — you need to give it a browser.
But "give it a browser" can mean very different things depending on your use case. In this post, we'll compare three approaches available on UniversalAPI, from lightest to heaviest:
- API-based search & scrape (Doc-Hound MCP Server)
- Managed headless browser (AWS AgentCore Browser)
- DIY Chromium on Lambda (Sparticuz/chromium)
Approach 1: API-Based Search & Scrape (Doc-Hound)
Best for: Quick research, text extraction, search results
The Doc-Hound MCP Server is a two-step research tool: search the web via SerpAPI, then scrape and extract the most relevant content using AI analysis.
1. search_web("best restaurants in Denver") → list of URLs
2. scrape_and_extract(urls, "extract restaurant names and ratings") → structured dataHow it works: Under the hood, Doc-Hound calls SerpAPI for Google search results, then fetches the HTML of selected pages and uses AI to extract exactly what you asked for. No browser is involved — it's pure HTTP requests + AI parsing.
Strengths:
- ⚡ Fast — API calls return in seconds
- 💰 Cheap — ~1 credit per search, ~1 credit per scrape
- 🔌 Works as an MCP server — connect from Cline, Claude Desktop, or any MCP client
- 📊 Great for structured data extraction from static pages
Limitations:
- ❌ Can't interact with pages (no clicking, no form filling)
- ❌ Can't handle JavaScript-rendered content (SPAs, dynamic dashboards)
- ❌ Can't take screenshots
- ❌ Can't navigate multi-step workflows
When to use it: You need to search the web and extract text from pages that render server-side. Research tasks, competitive analysis, documentation lookup, price checks on static e-commerce pages.
Approach 2: Managed Headless Browser (AgentCore Browser)
Best for: Interactive pages, JS-rendered content, form automation, screenshots
AgentCore Browser gives your agent a full headless Chromium browser running in AWS's managed infrastructure. Your agent gets Playwright-powered tools to navigate, click, type, screenshot, and extract — just like a human would.
Here's the complete agent code:
from strands import Agent
from strands.models.bedrock import BedrockModel
from strands_tools.browser import AgentCoreBrowser
def create_agent():
model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
region_name="us-east-1",
)
browser_tool = AgentCoreBrowser(region="us-east-1")
agent = Agent(
model=model,
tools=[browser_tool.browser],
system_prompt="You are a helpful assistant with web browsing capability.",
)
return agent, []That's it. No browser binaries to install, no Playwright servers to manage, no Docker images to build. The browser runs in AWS's secure, serverless infrastructure with per-tenant isolation.
Strengths:
- 🖥️ Full browser — JavaScript rendering, cookies, sessions
- 🖱️ Interactive — click buttons, fill forms, navigate multi-step flows
- 📸 Screenshots — capture visual state of pages
- 🔒 Secure — session-isolated sandbox, auto-cleanup, no credential leakage
- 🧹 Zero ops — no binaries, no layers, no Docker
Limitations:
- 💰 More expensive than API scraping (~5 credits/minute vs ~1 credit/call)
- ⏱️ Slower startup — browser session initialization takes a few seconds
- 🔄 Per-turn sessions — fresh browser each agent invocation (no persistent cookies across turns)
Pricing: ~5 credits per minute of browser time (~$0.005/min), with a 2-credit minimum per session. This is based on AWS's AgentCore Browser pricing ($0.0895/vCPU-hour + $0.00945/GB-hour) with a 3× markup — still 10-30× cheaper than competitors like Browserbase ($0.05-0.15/min).
| Scenario | Duration | Credits |
|---|---|---|
| Quick page check | 15s | 2 (minimum) |
| Medium scrape | 60s | 5 |
| Form submit + wait | 2 min | 11 |
| Multi-step research | 10 min | 37 |
When to use it: You need to interact with live websites — fill forms, click through multi-page flows, scrape JavaScript-rendered content, or take screenshots. The agent decides what to click and where to navigate based on what it sees.
Try it now
The Browser Assistant is a pre-built agent you can try immediately — no code required. Just chat with it and ask it to browse any website.
Approach 3: DIY Chromium on Lambda (Sparticuz/chromium)
Best for: Full control, custom browser configurations, cost optimization at scale
@sparticuz/chromium is an open-source package that bundles a headless Chromium binary optimized for serverless platforms. You install it as a Lambda layer, wire up Puppeteer or Playwright yourself, and manage the entire lifecycle.
const puppeteer = require("puppeteer-core");
const chromium = require("@sparticuz/chromium");
exports.handler = async (event) => {
const browser = await puppeteer.launch({
args: chromium.args,
executablePath: await chromium.executablePath(),
headless: "shell",
});
const page = await browser.newPage();
await page.goto("https://example.com");
const title = await page.title();
await browser.close();
return { title };
};Strengths:
- 💰 Cheapest at scale — you only pay Lambda compute costs
- 🔧 Full control — custom Chromium flags, fonts, extensions
- 📦 Self-contained — no external service dependencies
- 🏗️ Battle-tested — 1.6k GitHub stars, used in production by thousands
Limitations:
- 🛠️ Significant setup — Lambda layers, memory tuning (1600MB+ recommended), cold starts
- 📏 Size constraints — Chromium binary is ~50MB compressed, may hit Lambda limits
- 🧑💻 You own the ops — browser crashes, memory leaks, version updates are your problem
- 🤖 No AI integration — you write the navigation logic, not the LLM
- ⏱️ Cold starts — decompressing Chromium adds seconds to first invocation
When to use it: You're building a dedicated scraping pipeline (not an AI agent), need custom browser configurations, or are running at scale where per-minute pricing doesn't make sense. This is infrastructure, not an agent tool.
Comparison Table
| Doc-Hound (API) | AgentCore Browser | DIY Chromium | |
|---|---|---|---|
| JavaScript rendering | ❌ | ✅ | ✅ |
| Click/type/interact | ❌ | ✅ | ✅ (manual) |
| Screenshots | ❌ | ✅ | ✅ (manual) |
| AI-driven navigation | N/A | ✅ (LLM decides) | ❌ (you code it) |
| Setup effort | None (MCP connect) | 3 lines of Python | Hours (layers, config) |
| Cost per page | ~1 credit | ~2-5 credits | ~$0.001 Lambda |
| Best for | Research & text | Interactive browsing | Custom pipelines |
| Ops burden | Zero | Zero | High |
Which Should You Use?
Start with Doc-Hound if you just need to search and read web pages. It's fast, cheap, and works as an MCP server you can connect to any AI tool.
Upgrade to AgentCore Browser when you need the agent to interact with websites — fill forms, click through flows, handle JavaScript-rendered content, or take screenshots. It's the sweet spot of capability vs. complexity.
Go DIY with Sparticuz/chromium only if you're building a dedicated scraping pipeline at scale, need custom browser configurations, or want to avoid per-minute pricing entirely. This is an infrastructure choice, not an agent tool.
The beauty of UniversalAPI is that you can combine all three. An agent can use Doc-Hound for quick searches, AgentCore Browser for deep interactions, and store results in Knowledge for later retrieval — all in the same conversation.
Get Started
- Try the Browser Assistant — universalapi.co/agents/snowtimber/browser-assistant
- Read the docs — docs.universalapi.co/agents/browser
- Connect Doc-Hound — universalapi.co/mcp-servers/snowtimber/doc-hound
- Sign up — universalapi.co (100 free credits to start)
Have questions or want to share what you've built? Join us on GitHub.