Logo
Python Requests Library: Complete Guide (Including Proxy Setup) 2026

Python Requests Library: Complete Guide (Including Proxy Setup) 2026

Python requests library complete guide proxy setup 2026

TL;DR: Python requests is the standard HTTP library for Python — pip install requests. For scraping: always set a User-Agent header, use Session objects for multiple requests to the same site, set timeouts, and use rotating proxies to avoid IP bans. For SOCKS5: pip install requests[socks].


Why Python requests Is the Standard HTTP Library

The requests library has been downloaded over 500 million times on PyPI and is used by virtually every Python developer who works with HTTP. It was created by Kenneth Reitz with the design philosophy of "HTTP for Humans" — abstracting away the complexity of Python's built-in urllib.

requests vs urllib (Python built-in):

| Feature | requests | urllib (built-in) | |---|---|---| | Session support | ✅ Simple | ❌ Complex setup | | JSON parsing | ✅ response.json() | ❌ Manual | | Header management | ✅ Dict-based | ❌ Verbose | | Proxy support | ✅ Simple dict | ❌ Custom opener required | | Authentication | ✅ Built-in | ❌ Manual | | Retry logic | ✅ Via HTTPAdapter | ❌ Manual | | Code readability | ✅ Excellent | ❌ Verbose |


Installation

# Standard installation
pip install requests

# With SOCKS5 proxy support
pip install requests[socks]

# With all extras
pip install requests[socks,security]

# For virtual environment (recommended)
python -m venv scraper-env
source scraper-env/bin/activate  # Windows: scraper-env\Scripts\activate
pip install requests

# Verify installation
python -c "import requests; print(requests.__version__)"
# Output: 2.31.0 (or current version)

Basic HTTP Requests

GET Request

The GET method retrieves data from a server:

import requests

# Basic GET request
response = requests.get('https://api.github.com/repos/psf/requests')

# Check status code
print(response.status_code)   # 200 = success
print(response.headers)       # Response headers dict
print(response.encoding)      # Detected encoding
print(response.text)          # Response body as string
print(response.content)       # Response body as bytes
print(response.json())        # Parse JSON body to dict/list
print(response.url)           # Final URL (after redirects)
print(response.elapsed)       # Request duration (timedelta)

GET with Query Parameters

# Query parameters via params dict (URL-encoded automatically)
params = {
    'q': 'python proxy scraping',
    'sort': 'updated',
    'per_page': 50,
}

response = requests.get('https://api.github.com/search/repositories', params=params)

# Resulting URL: https://api.github.com/search/repositories?q=python+proxy+scraping&sort=updated&per_page=50
print(response.url)

data = response.json()
print(f"Found {data['total_count']} repositories")

POST Request

The POST method submits data to a server (forms, API payloads):

# POST form data (application/x-www-form-urlencoded)
response = requests.post('https://httpbin.org/post', data={
    'username': 'testuser',
    'password': 'testpass',
})

# POST JSON body (application/json)
response = requests.post('https://api.example.com/products', json={
    'name': 'Proxy Service',
    'price': 9.99,
    'category': 'networking',
})

# POST file upload (multipart/form-data)
with open('data.csv', 'rb') as f:
    response = requests.post('https://api.example.com/upload', files={
        'file': ('data.csv', f, 'text/csv'),
    })

print(response.status_code)
print(response.json())

Other HTTP Methods

# PUT — update a resource
response = requests.put('https://api.example.com/product/123', json={'price': 14.99})

# PATCH — partial update
response = requests.patch('https://api.example.com/product/123', json={'price': 14.99})

# DELETE — remove a resource
response = requests.delete('https://api.example.com/product/123')

# HEAD — get headers only (no body)
response = requests.head('https://example.com')
print(response.headers)

# OPTIONS — supported methods
response = requests.options('https://api.example.com/')
print(response.headers.get('Allow'))

Request Headers

Setting proper headers is critical for web scraping — servers use them to detect bots.

import requests

# Define realistic browser headers
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
}

response = requests.get('https://amazon.com', headers=headers)

# User agent rotation for scraping
import random

USER_AGENTS = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
    'Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0',
]

def random_headers():
    return {
        'User-Agent': random.choice(USER_AGENTS),
        'Accept': 'text/html,application/xhtml+xml',
        'Accept-Language': 'en-US,en;q=0.9',
    }

Proxy Configuration

Basic Proxy Setup

import requests

# HTTP proxy (no authentication)
proxies = {
    'http': 'http://proxy-server.example.com:8080',
    'https': 'http://proxy-server.example.com:8080',
}

response = requests.get('https://ip.me', proxies=proxies)
print(response.text)  # Should show proxy server's IP

# Authenticated proxy
proxies_auth = {
    'http': 'http://username:password@proxy-server.example.com:8080',
    'https': 'http://username:password@proxy-server.example.com:8080',
}

response = requests.get('https://ip.me', proxies=proxies_auth)

SOCKS5 Proxy (requires pip install requests[socks])

import requests

# SOCKS5 proxy
proxies = {
    'http': 'socks5://username:password@proxy-server.example.com:1080',
    'https': 'socks5://username:password@proxy-server.example.com:1080',
}

# socks5h:// routes DNS through the proxy (prevents DNS leaks)
proxies_doh = {
    'http': 'socks5h://username:password@proxy-server.example.com:1080',
    'https': 'socks5h://username:password@proxy-server.example.com:1080',
}

response = requests.get('https://ip.me', proxies=proxies_doh)

Rotating Proxy Pool

import requests
import random
import itertools
import time

# Option 1: Rotating endpoint (single address, auto-rotates IP)
ROTATING_PROXY = {
    'http': 'http://username:password@gate.limeproxies.com:5432',
    'https': 'http://username:password@gate.limeproxies.com:5432',
}

# Option 2: Manual proxy pool rotation
PROXY_POOL = [
    'http://user:pass@proxy1.example.com:8080',
    'http://user:pass@proxy2.example.com:8080',
    'http://user:pass@proxy3.example.com:8080',
]

proxy_cycle = itertools.cycle(PROXY_POOL)

def get_next_proxy():
    proxy_url = next(proxy_cycle)
    return {'http': proxy_url, 'https': proxy_url}

def scrape_with_rotating_proxy(url):
    for attempt in range(3):
        try:
            proxies = get_next_proxy()
            response = requests.get(
                url,
                proxies=proxies,
                headers=random_headers(),
                timeout=(5, 30),
            )
            response.raise_for_status()
            return response

        except (requests.exceptions.ProxyError,
                requests.exceptions.Timeout,
                requests.exceptions.ConnectionError) as e:
            print(f"Proxy failed (attempt {attempt+1}/3): {e}")
            time.sleep(2 ** attempt)  # Exponential backoff: 1s, 2s, 4s

    return None

See LimeProxies rotating residential proxies — the gateway endpoint automatically rotates through thousands of residential IPs per request.


Session Objects

Sessions reuse TCP connections and persist cookies, dramatically improving performance for multiple requests:

import requests

# Create a session
session = requests.Session()

# Set default headers for all requests
session.headers.update({
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...',
    'Accept-Language': 'en-US,en;q=0.9',
})

# Set default proxy for all requests
session.proxies.update({
    'http': 'http://username:password@gate.limeproxies.com:5432',
    'https': 'http://username:password@gate.limeproxies.com:5432',
})

# Login to maintain authenticated session
login_data = {'username': 'user', 'password': 'pass'}
session.post('https://example.com/login', data=login_data)

# Session now carries authentication cookies automatically
response = session.get('https://example.com/dashboard')
page2 = session.get('https://example.com/products')

# Clean up (close connections)
session.close()

# Or use as context manager (auto-closes)
with requests.Session() as s:
    s.headers.update({'User-Agent': 'Mozilla/5.0...'})
    response = s.get('https://example.com')

Retry Logic and Error Handling

Production scrapers must handle network failures gracefully:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import time

def create_robust_session(proxy=None):
    session = requests.Session()

    # Retry configuration
    retry_strategy = Retry(
        total=5,                          # Total retry attempts
        backoff_factor=1,                 # Wait: 1s, 2s, 4s, 8s, 16s
        status_forcelist=[429, 500, 502, 503, 504],  # Retry on these codes
        allowed_methods=['HEAD', 'GET', 'OPTIONS', 'POST'],
        raise_on_status=False,
    )

    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount('http://', adapter)
    session.mount('https://', adapter)

    if proxy:
        session.proxies.update({'http': proxy, 'https': proxy})

    return session

# Complete error handling example
def safe_request(session, url, **kwargs):
    try:
        response = session.get(url, timeout=(10, 30), **kwargs)
        response.raise_for_status()  # Raises HTTPError for 4xx/5xx
        return response

    except requests.exceptions.Timeout:
        print(f"Timeout fetching {url}")
    except requests.exceptions.TooManyRedirects:
        print(f"Too many redirects: {url}")
    except requests.exceptions.HTTPError as e:
        print(f"HTTP error {e.response.status_code}: {url}")
        if e.response.status_code == 429:
            retry_after = int(e.response.headers.get('Retry-After', 60))
            print(f"Rate limited — waiting {retry_after}s")
            time.sleep(retry_after)
    except requests.exceptions.ProxyError:
        print(f"Proxy connection failed for {url}")
    except requests.exceptions.ConnectionError:
        print(f"Network connection error: {url}")
    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")

    return None

# Use it
session = create_robust_session(proxy='http://user:pass@gate.limeproxies.com:5432')
response = safe_request(session, 'https://example.com/products')
if response:
    print(response.text[:500])

Authentication Methods

# HTTP Basic Authentication
from requests.auth import HTTPBasicAuth

response = requests.get('https://api.example.com/data',
                         auth=HTTPBasicAuth('username', 'password'))
# Shorthand:
response = requests.get('https://api.example.com/data', auth=('username', 'password'))

# HTTP Digest Authentication
from requests.auth import HTTPDigestAuth
response = requests.get('https://api.example.com/', auth=HTTPDigestAuth('user', 'pass'))

# Bearer Token (OAuth 2.0 / JWT)
headers = {'Authorization': f'Bearer {access_token}'}
response = requests.get('https://api.example.com/me', headers=headers)

# API Key in header
response = requests.get('https://api.example.com/data',
                         headers={'X-API-Key': 'your-api-key-here'})

# API Key as query parameter
response = requests.get('https://api.example.com/data',
                         params={'api_key': 'your-api-key-here'})

Cookies and Cookie Jars

import requests

# Send cookies with a request
cookies = {'session_id': 'abc123', 'user_pref': 'dark_mode'}
response = requests.get('https://example.com', cookies=cookies)

# Read cookies from response
print(response.cookies['session_id'])
print(dict(response.cookies))  # All cookies as dict

# Cookie jar for cross-request persistence
jar = requests.cookies.RequestsCookieJar()
jar.set('session', 'xyz789', domain='example.com', path='/')
response = requests.get('https://example.com', cookies=jar)

# Sessions automatically persist cookies
session = requests.Session()
session.get('https://example.com/login')
# Login response sets cookies — session carries them to next request
response = session.get('https://example.com/dashboard')

Downloading Files

import requests
import os

def download_file(url, filepath, proxies=None):
    """Stream download a file with progress tracking."""
    with requests.get(url, stream=True, proxies=proxies, timeout=(10, 60)) as response:
        response.raise_for_status()

        total_size = int(response.headers.get('Content-Length', 0))
        downloaded = 0

        with open(filepath, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
                downloaded += len(chunk)
                if total_size:
                    pct = (downloaded / total_size) * 100
                    print(f"\rDownloading: {pct:.1f}%", end='')

    print(f"\nSaved to {filepath} ({downloaded:,} bytes)")

# Usage
download_file(
    'https://example.com/large-dataset.csv',
    'data/dataset.csv',
    proxies={'https': 'http://user:pass@gate.limeproxies.com:5432'}
)

Complete Web Scraping Example with Proxy Rotation

import requests
import random
import time
import json
from bs4 import BeautifulSoup  # pip install beautifulsoup4
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Configuration
ROTATING_PROXY = 'http://username:password@gate.limeproxies.com:5432'
USER_AGENTS = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
]

def create_session():
    session = requests.Session()
    session.proxies.update({'http': ROTATING_PROXY, 'https': ROTATING_PROXY})
    session.mount('https://', HTTPAdapter(max_retries=Retry(total=3, backoff_factor=2)))
    return session

def scrape_products(category_url):
    session = create_session()
    products = []

    try:
        headers = {'User-Agent': random.choice(USER_AGENTS)}
        response = session.get(category_url, headers=headers, timeout=(10, 30))
        response.raise_for_status()

        soup = BeautifulSoup(response.text, 'html.parser')
        product_cards = soup.select('.product-card')

        for card in product_cards:
            products.append({
                'title': card.select_one('.product-title')?.get_text(strip=True),
                'price': card.select_one('.price')?.get_text(strip=True),
                'rating': card.select_one('.rating')?.get_text(strip=True),
                'url': card.select_one('a')?.get('href'),
            })

        # Rate limiting — random delay between pages
        time.sleep(random.uniform(1.5, 4.0))

    except Exception as e:
        print(f"Error scraping {category_url}: {e}")
    finally:
        session.close()

    return products

# Run
results = scrape_products('https://example.com/products')
print(f"Scraped {len(results)} products")

# Save results
with open('products.json', 'w') as f:
    json.dump(results, f, indent=2)

Response Status Code Reference

| Code | Meaning | Scraper Action | |---|---|---| | 200 | OK — success | Parse and return content | | 301/302 | Redirect | Follow automatically (default) or allow_redirects=False | | 400 | Bad Request | Fix request parameters | | 401 | Unauthorized | Add/refresh authentication | | 403 | Forbidden | Change User-Agent, proxy, or add headers | | 404 | Not Found | Skip URL — content removed | | 429 | Too Many Requests | Back off, respect Retry-After header | | 503 | Service Unavailable | Retry with delay; may be CAPTCHA |


Python requests vs Modern Alternatives (2026)

| Library | Speed | Async | Proxy Support | Best For | |---|---|---|---|---| | requests | Good | ❌ (blocking) | ✅ Full | Simple scripts, APIs | | httpx | Excellent | ✅ (async/sync) | ✅ Full | Modern replacement for requests | | aiohttp | Excellent | ✅ (async) | ✅ Full | High-concurrency scraping | | urllib3 | Fast | ❌ | ✅ Full | Low-level control |

For high-volume concurrent scraping (100+ simultaneous requests), consider httpx or aiohttp instead of requests — they support async natively.


Resources


Last updated: March 2026

Post Quick Links

Jump straight to the section of the post you want to read:

    FAQ's

    About the author

    Rachael Chapman

    A Complete Gamer and a Tech Geek. Brings out all her thoughts and Love in Writing Techie Blogs.

    View all posts
    Icon NextPrevData Extraction Using Web Scraping Proxies
    NextBuilding Web Scrapers For eCommerce Data GatheringIcon Prev
    No credit card required · Cancel anytime

    Start scaling your operations today

    Join 5,000+ businesses using LimeProxies for competitive intelligence,
    data collection, and growth automation — at any scale.

    Setup in under 2 minutes
    99.9% uptime SLA
    24/7 dedicated support
    G2 CrowdTrustpilot