LogoLogo
SupportDashboard
  • Community
  • Welcome to Hyperbrowser
  • Get Started
    • Quickstart
      • AI Agents
        • Browser Use
        • Claude Computer Use
        • OpenAI CUA
      • Web Scraping
        • Scrape
        • Crawl
        • Extract
      • Browser Automation
        • Puppeteer
        • Playwright
        • Selenium
  • Agents
    • Browser Use
    • Claude Computer Use
    • OpenAI CUA
  • HyperAgent
    • About HyperAgent
      • HyperAgent SDK
      • HyperAgent Types
  • Quickstart
  • Multi-Page actions
  • Custom Actions
  • MCP Support
    • Tutorial
  • Examples
    • Custom Actions
    • LLM support
    • Cloud Support
      • Setting Up
      • Proxies
      • Profiles
    • MCP Examples
      • Google Sheets
      • Weather
        • Weather Server
    • Output to Schema
  • Web Scraping
    • Scrape
    • Crawl
    • Extract
  • Sessions
    • Overview
      • Session Parameters
    • Advanced Privacy & Anti-Detection
      • Stealth Mode
      • Proxies
      • Static IPs
      • CAPTCHA Solving
      • Ad Blocking
    • Profiles
    • Recordings
    • Live View
    • Extensions
    • Downloads
  • Guides
    • Model Context Protocol
    • Scraping
    • AI Function Calling
    • Extract Information with an LLM
    • Using Hyperbrowser Session
    • CAPTCHA Solving
  • Integrations
    • ⛓️LangChain
    • 🦙LlamaIndex
  • reference
    • Pricing
    • SDKs
      • Node
        • Sessions
        • Profiles
        • Scrape
        • Crawl
        • Extensions
      • Python
        • Sessions
        • Profiles
        • Scrape
        • Crawl
        • Extensions
    • API Reference
      • Sessions
      • Scrape
      • Crawl
      • Extract
      • Agents
        • Browser Use
        • Claude Computer Use
        • OpenAI CUA
      • Profiles
      • Extensions
Powered by GitBook
On this page
  • Installation and Setup
  • Usage
Export as PDF
  1. Integrations

LlamaIndex

Using Hyperbrowser's Web Reader Integration

PreviousLangChainNextPricing

Last updated 2 months ago

Installation and Setup

To get started with LlamaIndex and Hyperbrowser, you can install the necessary packages using pip:

pip install llama-index-core llama-index-readers-web hyperbrowser

And you should configure credentials by setting the following environment variables:

HYPERBROWSER_API_KEY=<your-api-key>

You can get an API Key easily from the . Once you have your API Key, add it to your .env file as HYPERBROWSER_API_KEY or you can pass it via the api_key argument in the HyperbrowserWebReader constructor.

Usage

Once you have your API Key and have installed the packages you can load webpages into LlamaIndex using HyperbrowserWebReader.

from llama_index.readers.web import HyperbrowserWebReader

reader = HyperbrowserWebReader(api_key="your_api_key_here")

To load data, you can specify the operation to be performed by the loader. The default operation is scrape. For scrape, you can provide a single URL or a list of URLs to be scraped. For crawl, you can only provide a single URL. The crawl operation will crawl the provided page and subpages and return a document for each page. HyperbrowserWebReader supports loading and lazy loading data in both sync and async modes.

documents = reader.load_data(
    urls=["https://example.com"],
    operation="scrape",
)

Optional params for the loader can also be provided in the params argument. For more information on the supported params, you can see the params on the .

The params will be Snake case for python code, so here for example, it is max_pages instead of maxPages.

# Scrape
documents = reader.load_data(
    urls=["https://example.com"],
    operation="scrape",
    params={"scrape_options": {"include_tags": ["h1", "h2", "p"]}},
)

# Crawl
documents = reader.load_data(
    urls=["https://example.com"],
    operation="crawl",
    params={
        "max_pages": 10,
        "scrape_options": {
            "formats": ["markdown"],
        },
        "session_options": {
            "use_stealth": True,
        }
    }
)
🦙
dashboard
scraping guide