CAPTCHA Solving

Using Hyperbrowser's CAPTCHA Solving

Hyperbrowser's CAPTCHA solving feature requires being on a PAID plan.

In this guide, we will see how to use Hyperbrowser and its integrated CAPTCHA solver to scrape Today's Top Deals from Amazon without being blocked.

Setup

First, lets create a new Node.js project.

mkdir amazon-deals-scraper && cd amazon-deals-scraper
npm init -y

Installation

Next, let's install the necessary dependencies to run our script.

npm install @hyperbrowser/sdk puppeteer-core dotenv

Setup your Environment

To use Hyperbrowser with your code, you will need an API Key. You can get one easily from the dashboard. Once you have your API Key, add it to your .env file as HYPERBROWSER_API_KEY.

Code

Next, create a new file index.js and add the following code:

import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
import { connect } from "puppeteer-core";

config();

const client = new Hyperbrowser({
  apiKey: process.env.HYPERBROWSER_API_KEY,
});

const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));

const main = async () => {
  console.log("Starting session");
  const session = await client.sessions.create({
    solveCaptchas: true,
    adblock: true,
    annoyances: true,
    trackers: true,
  });
  console.log("Session created:", session.id);

  try {
    const browser = await connect({
      browserWSEndpoint: session.wsEndpoint,
      defaultViewport: null,
    });

    const [page] = await browser.pages();

    await page.goto("https://amazon.com/deals", {
      waitUntil: "load",
      timeout: 20_000,
    });

    const pageTitle = await page.title();
    console.log("Navigated to Page:", pageTitle);

    await sleep(10_000);

    const products = await page.evaluate(() => {
      const items = document.querySelectorAll(".dcl-carousel-element");
      return Array.from(items)
        .map((item) => {
          const nameElement = item.querySelector(".dcl-product-label");
          const dealPriceElement = item.querySelector(
            ".dcl-product-price-new .a-offscreen"
          );
          const originalPriceElement = item.querySelector(
            ".dcl-product-price-old .a-offscreen"
          );
          const percentOffElement = item.querySelector(
            ".dcl-badge .a-size-mini"
          );

          return {
            name: nameElement ? nameElement.textContent?.trim() : null,
            dealPrice: dealPriceElement
              ? dealPriceElement.textContent?.trim()
              : null,
            originalPrice: originalPriceElement
              ? originalPriceElement.textContent?.trim()
              : null,
            percentOff: percentOffElement
              ? percentOffElement.textContent?.trim()
              : null,
          };
        })
        .filter((product) => product.name && product.dealPrice);
    });

    console.log("Found products:", JSON.stringify(products, null, 2));
  } catch (error) {
    console.error(`Encountered an error: ${error}`);
  } finally {
    await client.sessions.stop(session.id);
    console.log("Session stopped:", session.id);
  }
};

main().catch((error) => {
  console.error(`Encountered an error: ${error}`);
});

If you are trying to solve simple image based captchas (the kind which get input into a text box for verification), you also have to add the imageCaptchaParamsfield. It takes an array of objects. Each object has a parameter for image selector and input selector. Together, these are used to specify where the source of a captcha will come from, and the input box into which the solution will have to be filled in. The selectors follow the standard html query-selector format as specified on mdn.

Run the Scraper

To run the Amazon deals scraper:

In your terminal, navigate to the project directory
Run the script with Node.js:

node index.js

The script will:

Create a new Hyperbrowser session with captcha solving, ad blocking, and anti-tracking enabled
Launch a Puppeteer browser and connect it to the session
Navigate to the Amazon deals page, solving any CAPTCHAs that are encountered
Wait 10 seconds for the page to load its content
Scrape the deal data using Puppeteer's page.evaluate method
Print the scraped products to the console
Close the browser and stop the Hyperbrowser session

You should see the scraped products printed in the console, like:

[
  {
    "name": "Apple AirPods Pro",
    "dealPrice": "$197.00",
    "originalPrice": "$249.99", 
    "percentOff": "21% off"
  },
  {
    "name": "Echo Dot (4th Gen)", 
    "dealPrice": "$27.99",
    "originalPrice": "$49.99",
    "percentOff": "44% off"  
  }
]

How it Works

Let's break down the key parts:

We create a new Hyperbrowser session with solveCaptchas, adblock, annoyances, and trackers set to true. This enables the captcha solver and other anti-bot evasion features.
We launch a Puppeteer browser and connect it to the Hyperbrowser session.
We navigate to the Amazon deals page and wait for any CAPTCHAs to be solved automatically by Hyperbrowser.
We pause execution for 10 seconds with sleep to allow all content to be loaded.
We use page.evaluate to run JavaScript on the page to scrape the deal data.
In the evaluator function, we select the deal elements, extract the relevant data, and return an array of product objects.
We print the scraped data and stop the Hyperbrowser session.

Without the solveCaptchas enabled, we could encounter a screen like this when trying to navigate to the deals page:

The captcha solver runs automatically in the background, so we don't need to handle captchas explicitly in our script. If a captcha appears, Hyperbrowser will solve it and continue loading the page. In this case, it would solve this CAPTCHA and continue on to the deals page.

PreviousUsing Hyperbrowser Session NextLangChain

Last updated 5 months ago