LogoLogo
SupportDashboard
  • Community
  • Welcome to Hyperbrowser
  • Get Started
    • Quickstart
      • AI Agents
        • Browser Use
        • Claude Computer Use
        • OpenAI CUA
      • Web Scraping
        • Scrape
        • Crawl
        • Extract
      • Browser Automation
        • Puppeteer
        • Playwright
        • Selenium
  • Agents
    • Browser Use
    • Claude Computer Use
    • OpenAI CUA
  • HyperAgent
    • About HyperAgent
      • HyperAgent SDK
      • HyperAgent Types
  • Quickstart
  • Multi-Page actions
  • Custom Actions
  • MCP Support
    • Tutorial
  • Examples
    • Custom Actions
    • LLM support
    • Cloud Support
      • Setting Up
      • Proxies
      • Profiles
    • MCP Examples
      • Google Sheets
      • Weather
        • Weather Server
    • Output to Schema
  • Web Scraping
    • Scrape
    • Crawl
    • Extract
  • Sessions
    • Overview
      • Session Parameters
    • Advanced Privacy & Anti-Detection
      • Stealth Mode
      • Proxies
      • Static IPs
      • CAPTCHA Solving
      • Ad Blocking
    • Profiles
    • Recordings
    • Live View
    • Extensions
    • Downloads
  • Guides
    • Model Context Protocol
    • Scraping
    • AI Function Calling
    • Extract Information with an LLM
    • Using Hyperbrowser Session
    • CAPTCHA Solving
  • Integrations
    • ⛓️LangChain
    • 🦙LlamaIndex
  • reference
    • Pricing
    • SDKs
      • Node
        • Sessions
        • Profiles
        • Scrape
        • Crawl
        • Extensions
      • Python
        • Sessions
        • Profiles
        • Scrape
        • Crawl
        • Extensions
    • API Reference
      • Sessions
      • Scrape
      • Crawl
      • Extract
      • Agents
        • Browser Use
        • Claude Computer Use
        • OpenAI CUA
      • Profiles
      • Extensions
Powered by GitBook
On this page
Export as PDF
  1. Get Started
  2. Quickstart
  3. Web Scraping

Extract

Extract data from sites using AI

PreviousCrawlNextBrowser Automation

Last updated 1 month ago

1

Install Hyperbrowser

npm install @hyperbrowser/sdk dotenv zod

or

yarn add @hyperbrowser/sdk dotenv zod
pip install hyperbrowser python-dotenv pydantic

or

uv add hyperbrowser python-dotenv pydantic
2

Setup your Environment

To use Hyperbrowser with your code, you will need an API Key. You can get one easily from the . Once you have your API Key, add it to your .env file as HYPERBROWSER_API_KEY .

3

Extract Data

Next, you can extract data from any site by simply setting up the Hyperbrowser client and providing any site urls and the schema you want the data in.

import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
import { z } from "zod";

config();

const client = new Hyperbrowser({
  apiKey: process.env.HYPERBROWSER_API_KEY,
});

const main = async () => {
  const schema = z.object({
    productName: z.string(),
    productOverview: z.string(),
    keyFeatures: z.array(z.string()),
    pricing: z.array(
      z.object({
        plan: z.string(),
        price: z.string(),
        features: z.array(z.string()),
      })
    ),
  });

  // Handles both starting and waiting for extract job response
  const result = await client.extract.startAndWait({
    urls: ["https://hyperbrowser.ai"],
    prompt:
      "Extract the product name, an overview of the product, its key features, and a list of its pricing plans from the page.",
    schema: schema,
  });

  console.log("result", JSON.stringify(result, null, 2));
};

main();
import os
from typing import List
from dotenv import load_dotenv
from hyperbrowser import Hyperbrowser
from hyperbrowser.models import StartExtractJobParams
from pydantic import BaseModel


# Load environment variables from .env file
load_dotenv()

# Initialize Hyperbrowser client
client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))


class PricingSchema(BaseModel):
    plan: str
    price: str
    features: List[str]


class ExtractSchema(BaseModel):
    product_name: str
    product_overview: str
    key_features: List[str]
    pricing: List[PricingSchema]


def main():
    extract_result = client.extract.start_and_wait(
        params=StartExtractJobParams(
            urls=["https://hyperbrowser.ai"],
            prompt="Extract the product name, an overview of the product, its key features, and a list of its pricing plans from the page.",
            schema=ExtractSchema,
        )
    )
    print("Extract result:\n", extract_result.model_dump_json(indent=2))


main()
4

View Extract in Dashboard

You can view all your extracts in the and see all their related information.

To view more details check out the page.

dashboard
dashboard
Extract