LogoLogo
SupportDashboard
  • Community
  • Welcome to Hyperbrowser
  • Get Started
    • Quickstart
      • AI Agents
        • Browser Use
        • Claude Computer Use
        • OpenAI CUA
      • Web Scraping
        • Scrape
        • Crawl
        • Extract
      • Browser Automation
        • Puppeteer
        • Playwright
        • Selenium
  • Agents
    • Browser Use
    • Claude Computer Use
    • OpenAI CUA
  • HyperAgent
    • About HyperAgent
      • HyperAgent SDK
      • HyperAgent Types
  • Quickstart
  • Multi-Page actions
  • Custom Actions
  • MCP Support
    • Tutorial
  • Examples
    • Custom Actions
    • LLM support
    • Cloud Support
      • Setting Up
      • Proxies
      • Profiles
    • MCP Examples
      • Google Sheets
      • Weather
        • Weather Server
    • Output to Schema
  • Web Scraping
    • Scrape
    • Crawl
    • Extract
  • Sessions
    • Overview
      • Session Parameters
    • Advanced Privacy & Anti-Detection
      • Stealth Mode
      • Proxies
      • Static IPs
      • CAPTCHA Solving
      • Ad Blocking
    • Profiles
    • Recordings
    • Live View
    • Extensions
    • Downloads
  • Guides
    • Model Context Protocol
    • Scraping
    • AI Function Calling
    • Extract Information with an LLM
    • Using Hyperbrowser Session
    • CAPTCHA Solving
  • Integrations
    • ⛓️LangChain
    • 🦙LlamaIndex
  • reference
    • Pricing
    • SDKs
      • Node
        • Sessions
        • Profiles
        • Scrape
        • Crawl
        • Extensions
      • Python
        • Sessions
        • Profiles
        • Scrape
        • Crawl
        • Extensions
    • API Reference
      • Sessions
      • Scrape
      • Crawl
      • Extract
      • Agents
        • Browser Use
        • Claude Computer Use
        • OpenAI CUA
      • Profiles
      • Extensions
Powered by GitBook
On this page
  • Start Crawl Job
  • ​Get Crawl Job
  • Start Crawl Job and Wait
  • Types
  • CrawlPageStatus
  • CrawlJobStatus
  • StartCrawlJobResponse
  • CrawledPage
  • CrawlJobResponse
Export as PDF
  1. reference
  2. SDKs
  3. Python

Crawl

PreviousScrapeNextExtensions

Last updated 4 months ago

Start Crawl Job

Starts a crawl job for a given URL.

Method: client.crawl.start(params: StartCrawlJobParams): StartCrawlJobResponse

Endpoint: POST /api/crawl

Parameters:

  • StartCrawlJobParams:

    • url: string - URL to scrape

    • max_pages?: number - Max number of pages to crawl

    • follow_links?: boolean - Follow links on the page

    • ignore_sitemap?: boolean - Ignore sitemap when finding links to crawl

    • exclude_patterns?: string[] - Patterns for paths to exclude from crawl

    • include_patterns?: string[] - Patterns for paths to include in the crawl

    • session_options?:

    • scrape_options?:

Response:

Example:

response = client.crawl.start(StartCrawlJobParams(url="https://example.com"))
print(response.status)

Retrieves details of a specific crawl job.

Method: client.crawl.get(id: str): CrawlJobResponse

Endpoint: GET /api/crawl/{id}

Parameters:

  • id: string - Crawl job ID

Example:

response = client.crawl.get(
  "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e"
)
print(response.status)

Start Crawl Job and Wait

Start a crawl job and wait for it to complete

Method: client.crawl.start_and_wait(params: StartCrawlJobParams): CrawlJobResponse

Parameters:

  • StartCrawlJobParams:

    • url: string - URL to scrape

    • max_pages?: number - Max number of pages to crawl

    • follow_links?: boolean - Follow links on the page

    • ignore_sitemap?: boolean - Ignore sitemap when finding links to crawl

    • exclude_patterns?: string[] - Patterns for paths to exclude from crawl

    • include_patterns?: string[] - Patterns for paths to include in the crawl

Example:

response = client.crawl.start_and_wait(StartCrawlJobParams(url="https://example.com"))
print(response.status)

Types

CrawlPageStatus

CrawlPageStatus = Literal["completed", "failed"]

CrawlJobStatus

CrawlJobStatus = Literal["pending", "running", "completed", "failed"]

StartCrawlJobResponse

class StartCrawlJobResponse(BaseModel):
    job_id: str = Field(alias="jobId")

CrawledPage

class CrawledPage(BaseModel):
    metadata: Optional[dict[str, Union[str, list[str]]]] = None
    html: Optional[str] = None
    markdown: Optional[str] = None
    links: Optional[List[str]] = None
    url: str
    status: CrawlPageStatus
    error: Optional[str] = None

CrawlJobResponse

class CrawlJobResponse(BaseModel):
    job_id: str = Field(alias="jobId")
    status: CrawlJobStatus
    error: Optional[str] = None
    data: List[CrawledPage] = Field(alias="data")
    total_crawled_pages: int = Field(alias="totalCrawledPages")
    total_page_batches: int = Field(alias="totalPageBatches")
    current_page_batch: int = Field(alias="currentPageBatch")
    batch_size: int = Field(alias="batchSize")

Get Crawl Job

Response:

session_options?:

scrape_options?:

Response:

​
CrawlJobResponse
CrawlJobResponse
StartCrawlJobResponse
CreateSessionParams
CreateSessionParams
ScrapeOptions
ScrapeOptions