
TL;DR: Python
requestsis the standard HTTP library for Python —pip install requests. For scraping: always set a User-Agent header, use Session objects for multiple requests to the same site, set timeouts, and use rotating proxies to avoid IP bans. For SOCKS5:pip install requests[socks].
Why Python requests Is the Standard HTTP Library
The requests library has been downloaded over 500 million times on PyPI and is used by virtually every Python developer who works with HTTP. It was created by Kenneth Reitz with the design philosophy of "HTTP for Humans" — abstracting away the complexity of Python's built-in urllib.
requests vs urllib (Python built-in):
| Feature | requests | urllib (built-in) |
|---|---|---|
| Session support | ✅ Simple | ❌ Complex setup |
| JSON parsing | ✅ response.json() | ❌ Manual |
| Header management | ✅ Dict-based | ❌ Verbose |
| Proxy support | ✅ Simple dict | ❌ Custom opener required |
| Authentication | ✅ Built-in | ❌ Manual |
| Retry logic | ✅ Via HTTPAdapter | ❌ Manual |
| Code readability | ✅ Excellent | ❌ Verbose |
Installation
# Standard installation
pip install requests
# With SOCKS5 proxy support
pip install requests[socks]
# With all extras
pip install requests[socks,security]
# For virtual environment (recommended)
python -m venv scraper-env
source scraper-env/bin/activate # Windows: scraper-env\Scripts\activate
pip install requests
# Verify installation
python -c "import requests; print(requests.__version__)"
# Output: 2.31.0 (or current version)
Basic HTTP Requests
GET Request
The GET method retrieves data from a server:
import requests
# Basic GET request
response = requests.get('https://api.github.com/repos/psf/requests')
# Check status code
print(response.status_code) # 200 = success
print(response.headers) # Response headers dict
print(response.encoding) # Detected encoding
print(response.text) # Response body as string
print(response.content) # Response body as bytes
print(response.json()) # Parse JSON body to dict/list
print(response.url) # Final URL (after redirects)
print(response.elapsed) # Request duration (timedelta)
GET with Query Parameters
# Query parameters via params dict (URL-encoded automatically)
params = {
'q': 'python proxy scraping',
'sort': 'updated',
'per_page': 50,
}
response = requests.get('https://api.github.com/search/repositories', params=params)
# Resulting URL: https://api.github.com/search/repositories?q=python+proxy+scraping&sort=updated&per_page=50
print(response.url)
data = response.json()
print(f"Found {data['total_count']} repositories")
POST Request
The POST method submits data to a server (forms, API payloads):
# POST form data (application/x-www-form-urlencoded)
response = requests.post('https://httpbin.org/post', data={
'username': 'testuser',
'password': 'testpass',
})
# POST JSON body (application/json)
response = requests.post('https://api.example.com/products', json={
'name': 'Proxy Service',
'price': 9.99,
'category': 'networking',
})
# POST file upload (multipart/form-data)
with open('data.csv', 'rb') as f:
response = requests.post('https://api.example.com/upload', files={
'file': ('data.csv', f, 'text/csv'),
})
print(response.status_code)
print(response.json())
Other HTTP Methods
# PUT — update a resource
response = requests.put('https://api.example.com/product/123', json={'price': 14.99})
# PATCH — partial update
response = requests.patch('https://api.example.com/product/123', json={'price': 14.99})
# DELETE — remove a resource
response = requests.delete('https://api.example.com/product/123')
# HEAD — get headers only (no body)
response = requests.head('https://example.com')
print(response.headers)
# OPTIONS — supported methods
response = requests.options('https://api.example.com/')
print(response.headers.get('Allow'))
Request Headers
Setting proper headers is critical for web scraping — servers use them to detect bots.
import requests
# Define realistic browser headers
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
}
response = requests.get('https://amazon.com', headers=headers)
# User agent rotation for scraping
import random
USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
'Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0',
]
def random_headers():
return {
'User-Agent': random.choice(USER_AGENTS),
'Accept': 'text/html,application/xhtml+xml',
'Accept-Language': 'en-US,en;q=0.9',
}
Proxy Configuration
Basic Proxy Setup
import requests
# HTTP proxy (no authentication)
proxies = {
'http': 'http://proxy-server.example.com:8080',
'https': 'http://proxy-server.example.com:8080',
}
response = requests.get('https://ip.me', proxies=proxies)
print(response.text) # Should show proxy server's IP
# Authenticated proxy
proxies_auth = {
'http': 'http://username:password@proxy-server.example.com:8080',
'https': 'http://username:password@proxy-server.example.com:8080',
}
response = requests.get('https://ip.me', proxies=proxies_auth)
SOCKS5 Proxy (requires pip install requests[socks])
import requests
# SOCKS5 proxy
proxies = {
'http': 'socks5://username:password@proxy-server.example.com:1080',
'https': 'socks5://username:password@proxy-server.example.com:1080',
}
# socks5h:// routes DNS through the proxy (prevents DNS leaks)
proxies_doh = {
'http': 'socks5h://username:password@proxy-server.example.com:1080',
'https': 'socks5h://username:password@proxy-server.example.com:1080',
}
response = requests.get('https://ip.me', proxies=proxies_doh)
Rotating Proxy Pool
import requests
import random
import itertools
import time
# Option 1: Rotating endpoint (single address, auto-rotates IP)
ROTATING_PROXY = {
'http': 'http://username:password@gate.limeproxies.com:5432',
'https': 'http://username:password@gate.limeproxies.com:5432',
}
# Option 2: Manual proxy pool rotation
PROXY_POOL = [
'http://user:pass@proxy1.example.com:8080',
'http://user:pass@proxy2.example.com:8080',
'http://user:pass@proxy3.example.com:8080',
]
proxy_cycle = itertools.cycle(PROXY_POOL)
def get_next_proxy():
proxy_url = next(proxy_cycle)
return {'http': proxy_url, 'https': proxy_url}
def scrape_with_rotating_proxy(url):
for attempt in range(3):
try:
proxies = get_next_proxy()
response = requests.get(
url,
proxies=proxies,
headers=random_headers(),
timeout=(5, 30),
)
response.raise_for_status()
return response
except (requests.exceptions.ProxyError,
requests.exceptions.Timeout,
requests.exceptions.ConnectionError) as e:
print(f"Proxy failed (attempt {attempt+1}/3): {e}")
time.sleep(2 ** attempt) # Exponential backoff: 1s, 2s, 4s
return None
See LimeProxies rotating residential proxies — the gateway endpoint automatically rotates through thousands of residential IPs per request.
Session Objects
Sessions reuse TCP connections and persist cookies, dramatically improving performance for multiple requests:
import requests
# Create a session
session = requests.Session()
# Set default headers for all requests
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...',
'Accept-Language': 'en-US,en;q=0.9',
})
# Set default proxy for all requests
session.proxies.update({
'http': 'http://username:password@gate.limeproxies.com:5432',
'https': 'http://username:password@gate.limeproxies.com:5432',
})
# Login to maintain authenticated session
login_data = {'username': 'user', 'password': 'pass'}
session.post('https://example.com/login', data=login_data)
# Session now carries authentication cookies automatically
response = session.get('https://example.com/dashboard')
page2 = session.get('https://example.com/products')
# Clean up (close connections)
session.close()
# Or use as context manager (auto-closes)
with requests.Session() as s:
s.headers.update({'User-Agent': 'Mozilla/5.0...'})
response = s.get('https://example.com')
Retry Logic and Error Handling
Production scrapers must handle network failures gracefully:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import time
def create_robust_session(proxy=None):
session = requests.Session()
# Retry configuration
retry_strategy = Retry(
total=5, # Total retry attempts
backoff_factor=1, # Wait: 1s, 2s, 4s, 8s, 16s
status_forcelist=[429, 500, 502, 503, 504], # Retry on these codes
allowed_methods=['HEAD', 'GET', 'OPTIONS', 'POST'],
raise_on_status=False,
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount('http://', adapter)
session.mount('https://', adapter)
if proxy:
session.proxies.update({'http': proxy, 'https': proxy})
return session
# Complete error handling example
def safe_request(session, url, **kwargs):
try:
response = session.get(url, timeout=(10, 30), **kwargs)
response.raise_for_status() # Raises HTTPError for 4xx/5xx
return response
except requests.exceptions.Timeout:
print(f"Timeout fetching {url}")
except requests.exceptions.TooManyRedirects:
print(f"Too many redirects: {url}")
except requests.exceptions.HTTPError as e:
print(f"HTTP error {e.response.status_code}: {url}")
if e.response.status_code == 429:
retry_after = int(e.response.headers.get('Retry-After', 60))
print(f"Rate limited — waiting {retry_after}s")
time.sleep(retry_after)
except requests.exceptions.ProxyError:
print(f"Proxy connection failed for {url}")
except requests.exceptions.ConnectionError:
print(f"Network connection error: {url}")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
return None
# Use it
session = create_robust_session(proxy='http://user:pass@gate.limeproxies.com:5432')
response = safe_request(session, 'https://example.com/products')
if response:
print(response.text[:500])
Authentication Methods
# HTTP Basic Authentication
from requests.auth import HTTPBasicAuth
response = requests.get('https://api.example.com/data',
auth=HTTPBasicAuth('username', 'password'))
# Shorthand:
response = requests.get('https://api.example.com/data', auth=('username', 'password'))
# HTTP Digest Authentication
from requests.auth import HTTPDigestAuth
response = requests.get('https://api.example.com/', auth=HTTPDigestAuth('user', 'pass'))
# Bearer Token (OAuth 2.0 / JWT)
headers = {'Authorization': f'Bearer {access_token}'}
response = requests.get('https://api.example.com/me', headers=headers)
# API Key in header
response = requests.get('https://api.example.com/data',
headers={'X-API-Key': 'your-api-key-here'})
# API Key as query parameter
response = requests.get('https://api.example.com/data',
params={'api_key': 'your-api-key-here'})
Cookies and Cookie Jars
import requests
# Send cookies with a request
cookies = {'session_id': 'abc123', 'user_pref': 'dark_mode'}
response = requests.get('https://example.com', cookies=cookies)
# Read cookies from response
print(response.cookies['session_id'])
print(dict(response.cookies)) # All cookies as dict
# Cookie jar for cross-request persistence
jar = requests.cookies.RequestsCookieJar()
jar.set('session', 'xyz789', domain='example.com', path='/')
response = requests.get('https://example.com', cookies=jar)
# Sessions automatically persist cookies
session = requests.Session()
session.get('https://example.com/login')
# Login response sets cookies — session carries them to next request
response = session.get('https://example.com/dashboard')
Downloading Files
import requests
import os
def download_file(url, filepath, proxies=None):
"""Stream download a file with progress tracking."""
with requests.get(url, stream=True, proxies=proxies, timeout=(10, 60)) as response:
response.raise_for_status()
total_size = int(response.headers.get('Content-Length', 0))
downloaded = 0
with open(filepath, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
downloaded += len(chunk)
if total_size:
pct = (downloaded / total_size) * 100
print(f"\rDownloading: {pct:.1f}%", end='')
print(f"\nSaved to {filepath} ({downloaded:,} bytes)")
# Usage
download_file(
'https://example.com/large-dataset.csv',
'data/dataset.csv',
proxies={'https': 'http://user:pass@gate.limeproxies.com:5432'}
)
Complete Web Scraping Example with Proxy Rotation
import requests
import random
import time
import json
from bs4 import BeautifulSoup # pip install beautifulsoup4
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# Configuration
ROTATING_PROXY = 'http://username:password@gate.limeproxies.com:5432'
USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
]
def create_session():
session = requests.Session()
session.proxies.update({'http': ROTATING_PROXY, 'https': ROTATING_PROXY})
session.mount('https://', HTTPAdapter(max_retries=Retry(total=3, backoff_factor=2)))
return session
def scrape_products(category_url):
session = create_session()
products = []
try:
headers = {'User-Agent': random.choice(USER_AGENTS)}
response = session.get(category_url, headers=headers, timeout=(10, 30))
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
product_cards = soup.select('.product-card')
for card in product_cards:
products.append({
'title': card.select_one('.product-title')?.get_text(strip=True),
'price': card.select_one('.price')?.get_text(strip=True),
'rating': card.select_one('.rating')?.get_text(strip=True),
'url': card.select_one('a')?.get('href'),
})
# Rate limiting — random delay between pages
time.sleep(random.uniform(1.5, 4.0))
except Exception as e:
print(f"Error scraping {category_url}: {e}")
finally:
session.close()
return products
# Run
results = scrape_products('https://example.com/products')
print(f"Scraped {len(results)} products")
# Save results
with open('products.json', 'w') as f:
json.dump(results, f, indent=2)
Response Status Code Reference
| Code | Meaning | Scraper Action |
|---|---|---|
| 200 | OK — success | Parse and return content |
| 301/302 | Redirect | Follow automatically (default) or allow_redirects=False |
| 400 | Bad Request | Fix request parameters |
| 401 | Unauthorized | Add/refresh authentication |
| 403 | Forbidden | Change User-Agent, proxy, or add headers |
| 404 | Not Found | Skip URL — content removed |
| 429 | Too Many Requests | Back off, respect Retry-After header |
| 503 | Service Unavailable | Retry with delay; may be CAPTCHA |
Python requests vs Modern Alternatives (2026)
| Library | Speed | Async | Proxy Support | Best For | |---|---|---|---|---| | requests | Good | ❌ (blocking) | ✅ Full | Simple scripts, APIs | | httpx | Excellent | ✅ (async/sync) | ✅ Full | Modern replacement for requests | | aiohttp | Excellent | ✅ (async) | ✅ Full | High-concurrency scraping | | urllib3 | Fast | ❌ | ✅ Full | Low-level control |
For high-volume concurrent scraping (100+ simultaneous requests), consider httpx or aiohttp instead of requests — they support async natively.
Resources
- Python requests docs — official documentation
- proxy-for-web-scraping — choosing the right proxy for Python scraping
- buy-rotating-proxies — rotating residential proxy plans for Python scrapers
- buy-socks-proxies — SOCKS5 proxy plans (use with
requests[socks])
Last updated: March 2026
Post Quick Links
Jump straight to the section of the post you want to read:


About the author
Rachael Chapman
A Complete Gamer and a Tech Geek. Brings out all her thoughts and Love in Writing Techie Blogs.
View all postsRelated Articles
Datacenter vs Residential Proxies: Which Should You Choose in 2026?
Datacenter proxies are faster and cheaper for most tasks. Residential proxies handle heavily bot-protected sites. This guide breaks down every difference so you pick the right type — and avoid overpaying.
How to Use Proxies for E-Commerce Price Monitoring in 2026
Proxies are the backbone of reliable e-commerce price monitoring. Discover how to track competitor prices at scale, beat anti-bot systems, monitor geo-specific pricing, and build a full price intelligence stack in 2026.
Web Scraping With Proxies: The Complete Guide for 2026
Web scraping with proxies lets developers and businesses collect data at scale without IP bans or rate limits. This complete 2026 guide covers Python setup, proxy rotation, tool comparisons, anti-bot tactics, and ethical best practices.