Web Scraping with Node.js and Playwright to easily extract information from Amazon
3 min readMay 3, 2024
Let's see how to do some Web Scrapping from Amazon for a simple project.
Prerequisites
- Create a directory, you can call it "scraping":
mkdir scrapping
cd scrapping
2. Initialize your proyect with npm:
npm init -y
3. Install the dependencies:
npm install playwright
4. (Optional) — You might need to install a few more dependencies (Playwright's browser), I use Ubuntu, these were the steps:
sudo npx playwright install-deps
sudo apt-get install libevent-2.1–7
5. Create the Web Scraper code — we will use the following Node script that you can adapt. Create a file named index.mjs with the following content:
import { chromium } from 'playwright';
const browser = await chromium.launch(
{ headless : true }
);
const page = await browser.newPage();
await page.goto(
'https://www.amazon.com.mx/s?k=thomas+pynchon&__mk_es_MX=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=23H57HYI5SCAL&sprefix=thomas+pynchon%2Caps%2C143&ref=nb_sb_noss_1'
);
const products = await page.$$eval(
'.s-card-container',
(results) => (
results.map((el) => {
const title = el.querySelector('h2')?.innerText;
if (!title) {
return null;
}
const image =…