
TL;DR: JavaScript is one of the best languages for web scraping in 2026. Use Cheerio for static HTML (fast, lightweight), Playwright for JavaScript-rendered pages (recommended over Puppeteer), and rotating residential proxies to avoid IP bans at scale. Key stack:
Node.js + Playwright + got + cheerio + rotating-proxy endpoint.
Why JavaScript for Web Scraping?
JavaScript (Node.js) has unique advantages for web scraping:
- Same language as the web — most websites run JavaScript; scraping with it means you understand the target's code
- Async-first architecture — Node.js handles thousands of concurrent requests efficiently without threads
- Native browser automation — Playwright and Puppeteer are built primarily for Node.js
- Network interception — capturing XHR/fetch API responses is cleaner with the same event model the browser uses
- npm ecosystem — hundreds of scraping-related packages (got, axios, cheerio, playwright, puppeteer, p-limit)
Prerequisites and Setup
What You Need
- Node.js (v18+ recommended — LTS as of 2026)
- npm or pnpm (package manager)
- A code editor (VS Code recommended)
# Verify Node.js installation
node --version # Should show v18+ or v20+
npm --version
# Create project directory
mkdir amazon-scraper && cd amazon-scraper
npm init -y
Installing Core Dependencies
# For static HTML scraping
npm install got cheerio
# For dynamic/JavaScript-rendered pages
npm install playwright
# Install browser binaries for Playwright
npx playwright install chromium
# For concurrent request management
npm install p-limit
# For data export
npm install fast-csv
Method 1: Static HTML Scraping with Cheerio
Best for: Pages where content is fully rendered in the initial HTML response.
Cheerio implements jQuery's API on the server, giving you familiar $('.selector').text() syntax without a browser.
Basic Scraping Example
import got from 'got';
import * as cheerio from 'cheerio';
async function scrapeProductPage(url) {
try {
const { body } = await got(url, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9',
},
});
const $ = cheerio.load(body);
const product = {
title: $('h1.product-title').text().trim(),
price: $('.price-current').first().text().trim(),
rating: $('.rating-value').text().trim(),
reviews: parseInt($('.review-count').text().replace(/\D/g, '')),
description: $('.product-description').text().trim(),
};
return product;
} catch (error) {
console.error(`Failed to scrape ${url}:`, error.message);
return null;
}
}
// Run it
const data = await scrapeProductPage('https://example.com/product/123');
console.log(data);
Scraping Multiple Pages in Parallel (with Rate Limiting)
import got from 'got';
import * as cheerio from 'cheerio';
import pLimit from 'p-limit';
// Limit to 5 concurrent requests maximum
const limit = pLimit(5);
const urls = [
'https://example.com/product/1',
'https://example.com/product/2',
'https://example.com/product/3',
// ... hundreds more
];
async function scrapeWithDelay(url) {
// Random delay 1-3 seconds between requests
await new Promise(resolve => setTimeout(resolve, Math.random() * 2000 + 1000));
return scrapeProductPage(url);
}
// Scrape all URLs with rate limiting
const results = await Promise.all(
urls.map(url => limit(() => scrapeWithDelay(url)))
);
const validResults = results.filter(Boolean);
console.log(`Scraped ${validResults.length} products`);
Method 2: Dynamic Content Scraping with Playwright
Best for: React, Vue, Angular, Next.js pages, infinite scroll, JavaScript-triggered content.
Basic Playwright Setup
import { chromium } from 'playwright';
async function scrapeDynamicPage(url) {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
viewport: { width: 1366, height: 768 },
});
const page = await context.newPage();
try {
await page.goto(url, { waitUntil: 'networkidle', timeout: 30000 });
// Wait for specific element to confirm page is ready
await page.waitForSelector('.product-title', { timeout: 10000 });
// Extract data
const data = await page.evaluate(() => ({
title: document.querySelector('.product-title')?.textContent?.trim(),
price: document.querySelector('.price')?.textContent?.trim(),
rating: document.querySelector('.rating')?.textContent?.trim(),
}));
return data;
} finally {
await browser.close();
}
}
Playwright with Proxy Integration
import { chromium } from 'playwright';
async function scrapeWithProxy(url, proxyConfig) {
const browser = await chromium.launch({
headless: true,
proxy: {
server: proxyConfig.server, // 'http://gate.limeproxies.com:5432'
username: proxyConfig.username,
password: proxyConfig.password,
},
});
const page = await browser.newPage();
try {
await page.goto(url, { waitUntil: 'domcontentloaded' });
// ... extraction logic
return await page.title();
} finally {
await browser.close();
}
}
// Use a rotating residential proxy endpoint
const proxyConfig = {
server: 'http://gate.limeproxies.com:5432',
username: 'your-username',
password: 'your-password',
};
const result = await scrapeWithProxy('https://amazon.com/dp/B08N5WRWNW', proxyConfig);
Anti-Detection: Stealth Mode
Playwright can be detected as a headless browser. Use these techniques to evade detection:
import { chromium } from 'playwright';
const browser = await chromium.launch({
headless: true,
args: [
'--disable-blink-features=AutomationControlled', // Remove automation flag
'--disable-features=site-per-process',
],
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
viewport: { width: 1366, height: 768 },
locale: 'en-US',
timezoneId: 'America/New_York',
permissions: ['geolocation'],
});
const page = await context.newPage();
// Override automation detection properties
await page.addInitScript(() => {
// Remove webdriver flag
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
// Fake plugins (real browsers have these)
Object.defineProperty(navigator, 'plugins', {
get: () => [{ name: 'PDF Plugin' }, { name: 'Chrome PDF Viewer' }],
});
});
Method 3: API Interception (Best for SPAs)
Many modern websites load data via API calls (XHR/fetch). Capturing these is often faster and more reliable than parsing HTML.
import { chromium } from 'playwright';
async function captureApiResponse(pageUrl, apiPattern) {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
const capturedData = [];
// Listen for network responses matching our pattern
page.on('response', async (response) => {
if (response.url().includes(apiPattern) && response.status() === 200) {
try {
const json = await response.json();
capturedData.push(json);
} catch (e) {
// Not JSON — skip
}
}
});
await page.goto(pageUrl, { waitUntil: 'networkidle' });
// For infinite scroll pages — scroll to trigger more API calls
for (let i = 0; i < 5; i++) {
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(2000);
}
await browser.close();
return capturedData;
}
// Example: capture Amazon product API responses
const products = await captureApiResponse(
'https://www.amazon.com/s?k=laptops',
'/api/s?'
);
Rotating Proxies in Node.js
For large-scale scraping, rotating proxies are essential to avoid IP bans.
Option 1: Rotating Residential Proxy Endpoint (Recommended)
The simplest approach — use a single endpoint that automatically rotates IPs:
import got from 'got';
// LimeProxies rotating residential endpoint
const proxyUrl = 'http://username:password@gate.limeproxies.com:5432';
const response = await got('https://target-site.com/products', {
https: { rejectUnauthorized: false },
agent: {
https: new HttpsProxyAgent(proxyUrl),
},
headers: {
'User-Agent': getRandomUserAgent(),
},
});
Option 2: Manual Proxy Pool with Rotation
const proxyPool = [
'http://user:pass@proxy1.example.com:8080',
'http://user:pass@proxy2.example.com:8080',
'http://user:pass@proxy3.example.com:8080',
// ... more proxies
];
let proxyIndex = 0;
function getNextProxy() {
const proxy = proxyPool[proxyIndex % proxyPool.length];
proxyIndex++;
return proxy;
}
async function scrapeWithRotation(url) {
const proxy = getNextProxy();
return got(url, {
agent: { https: new HttpsProxyAgent(proxy) },
timeout: { request: 30000 },
retry: {
limit: 3,
statusCodes: [503, 429],
},
});
}
See LimeProxies rotating residential proxies and SOCKS5 proxies for proxy plans suited to large-scale Node.js scraping.
Handling Pagination
URL-Pattern Pagination
async function scrapeAllPages(baseUrl, totalPages) {
const limit = pLimit(3); // 3 concurrent requests
const pageUrls = Array.from({ length: totalPages }, (_, i) =>
`${baseUrl}?page=${i + 1}`
);
const results = await Promise.all(
pageUrls.map(url => limit(() => scrapeProductPage(url)))
);
return results.flat().filter(Boolean);
}
Infinite Scroll Pagination
async function scrapeInfiniteScroll(url) {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url);
const allItems = new Set();
let previousHeight = 0;
while (true) {
// Extract currently visible items
const items = await page.$$eval('.product-card', elements =>
elements.map(el => ({
title: el.querySelector('.title')?.textContent?.trim(),
price: el.querySelector('.price')?.textContent?.trim(),
}))
);
items.forEach(item => allItems.add(JSON.stringify(item)));
// Scroll to bottom
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(2000);
const newHeight = await page.evaluate(() => document.body.scrollHeight);
// Stop if page didn't grow (end of content)
if (newHeight === previousHeight) break;
previousHeight = newHeight;
}
await browser.close();
return [...allItems].map(JSON.parse);
}
Exporting Scraped Data
Save to CSV
import { createObjectCsvWriter } from 'csv-writer';
const csvWriter = createObjectCsvWriter({
path: 'products.csv',
header: [
{ id: 'title', title: 'Title' },
{ id: 'price', title: 'Price' },
{ id: 'rating', title: 'Rating' },
{ id: 'url', title: 'URL' },
],
});
await csvWriter.writeRecords(products);
console.log('CSV written: products.csv');
Save to JSON
import { writeFileSync } from 'fs';
writeFileSync('products.json', JSON.stringify(products, null, 2));
console.log(`Saved ${products.length} products to products.json`);
JavaScript vs Python for Web Scraping
| Feature | JavaScript (Node.js) | Python | |---|---|---| | Async concurrency | Excellent (native event loop) | Good (asyncio, but more complex) | | Browser automation | Best-in-class (Playwright native) | Excellent (Playwright Python port) | | Data processing | Good (lodash, streams) | Excellent (pandas, NumPy) | | ML integration | Limited | Extensive (scikit-learn, TensorFlow) | | Learning curve | Moderate (async syntax) | Low (beginner friendly) | | Production ops | Good (Node.js ecosystem) | Excellent (mature tooling) | | Speed (static scraping) | Very fast (got + cheerio) | Fast (httpx + BeautifulSoup) |
Verdict: For pure scraping with browser automation, JavaScript/Node.js and Python are essentially equivalent. Choose based on your team's existing expertise. JavaScript has a slight edge for React-heavy SPAs since engineers understand the runtime natively.
Complete Production Scraper Example
import { chromium } from 'playwright';
import * as cheerio from 'cheerio';
import got from 'got';
import pLimit from 'p-limit';
import { createObjectCsvWriter } from 'csv-writer';
import { HttpsProxyAgent } from 'https-proxy-agent';
const CONFIG = {
concurrency: 5,
delayMs: { min: 1500, max: 4000 },
proxy: 'http://user:pass@gate.limeproxies.com:5432',
outputFile: 'products.csv',
};
function randomDelay() {
const ms = Math.random() * (CONFIG.delayMs.max - CONFIG.delayMs.min) + CONFIG.delayMs.min;
return new Promise(resolve => setTimeout(resolve, ms));
}
async function scrapePage(url) {
await randomDelay();
try {
const { body } = await got(url, {
agent: { https: new HttpsProxyAgent(CONFIG.proxy) },
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ...',
'Accept': 'text/html,application/xhtml+xml',
'Accept-Language': 'en-US,en;q=0.9',
},
timeout: { request: 30000 },
});
const $ = cheerio.load(body);
return {
url,
title: $('h1').first().text().trim(),
price: $('.price').first().text().trim(),
rating: $('.rating').first().text().trim(),
scrapedAt: new Date().toISOString(),
};
} catch (error) {
console.error(`Error scraping ${url}:`, error.message);
return null;
}
}
async function main(urls) {
const limit = pLimit(CONFIG.concurrency);
console.log(`Scraping ${urls.length} URLs with concurrency ${CONFIG.concurrency}...`);
const results = await Promise.all(
urls.map(url => limit(() => scrapePage(url)))
);
const validResults = results.filter(Boolean);
const csvWriter = createObjectCsvWriter({
path: CONFIG.outputFile,
header: Object.keys(validResults[0]).map(id => ({ id, title: id })),
});
await csvWriter.writeRecords(validResults);
console.log(`Done. Saved ${validResults.length}/${urls.length} records to ${CONFIG.outputFile}`);
}
// Run
const targetUrls = ['https://example.com/page/1', /* ... */];
main(targetUrls);
Resources for JavaScript Web Scraping
- Playwright docs — Official browser automation documentation
- Cheerio docs — Server-side jQuery for HTML parsing
- got npm package — Modern HTTP client for Node.js
- proxy-for-web-scraping — Proxy selection guide for scrapers
- buy-rotating-proxies — Rotating residential proxy plans
Last updated: March 2026
Post Quick Links
Jump straight to the section of the post you want to read:


About the author
Rachael Chapman
A Complete Gamer and a Tech Geek. Brings out all her thoughts and Love in Writing Techie Blogs.
View all postsRelated Articles
Datacenter vs Residential Proxies: Which Should You Choose in 2026?
Datacenter proxies are faster and cheaper for most tasks. Residential proxies handle heavily bot-protected sites. This guide breaks down every difference so you pick the right type — and avoid overpaying.
How to Use Proxies for E-Commerce Price Monitoring in 2026
Proxies are the backbone of reliable e-commerce price monitoring. Discover how to track competitor prices at scale, beat anti-bot systems, monitor geo-specific pricing, and build a full price intelligence stack in 2026.
Web Scraping With Proxies: The Complete Guide for 2026
Web scraping with proxies lets developers and businesses collect data at scale without IP bans or rate limits. This complete 2026 guide covers Python setup, proxy rotation, tool comparisons, anti-bot tactics, and ethical best practices.