Scrapling: An Undetectable, Powerful, and Adaptive Python Web Scraping Library

Introduction

Scrapling is an advanced, high-performance Python library designed to make web scraping easy and effortless. It stands out by offering undetectable, powerful, and flexible capabilities, making it a robust solution for modern web data extraction challenges. Unlike traditional scraping tools, Scrapling is an adaptive library that learns from website changes, automatically relocating elements and keeping your scrapers running even after structural updates. Built by web scrapers for web scrapers, it provides a comprehensive suite of tools for both beginners and experienced developers.

Why Use Scrapling?

Scrapling offers a unique combination of features that address common web scraping pain points:

Adaptive Scraping & AI Integration

Smart Element Tracking: Automatically relocates elements after website changes using intelligent similarity algorithms, reducing maintenance.
Smart Flexible Selection: Supports CSS selectors, XPath, filter-based search, text search, and regex, providing versatile data extraction.
Find Similar Elements: Easily locate elements similar to those already found.
MCP Server for AI: Features a built-in MCP (Multi-Content Processor) server for AI-assisted web scraping, optimizing data extraction and minimizing token usage with AI models like Claude or Cursor.

Advanced Website Fetching with Session Support

HTTP Requests: Perform fast and stealthy HTTP requests with Fetcher, impersonating browser TLS fingerprints, headers, and supporting HTTP3.
Dynamic Loading: Handle dynamic websites with full browser automation using DynamicFetcher, supporting Playwright's Chromium, real Chrome, and custom stealth modes.
Anti-bot Bypass: StealthyFetcher provides advanced stealth capabilities, including modified Firefox and fingerprint spoofing, to bypass Cloudflare's Turnstile and Interstitial challenges.
Session Management: Maintain state and cookies across requests with FetcherSession, StealthySession, and DynamicSession.
Async Support: Full asynchronous support across all fetchers and session classes for high-concurrency scraping.

High-Performance & Battle-tested Architecture

Lightning Fast: Optimized for superior performance, often outperforming many other Python scraping libraries.
Memory Efficient: Utilizes optimized data structures and lazy loading to ensure a minimal memory footprint.
Fast JSON Serialization: Offers significantly faster JSON serialization compared to the standard library.
Battle-tested: With 92% test coverage and full type hints, Scrapling has been rigorously tested and used daily by hundreds of web scrapers.

Developer-Friendly Experience

Interactive Web Scraping Shell: An optional built-in IPython shell with Scrapling integration, shortcuts, and tools to accelerate script development.
CLI Usage: Scrape URLs directly from the terminal without writing any Python code.
Rich Navigation API: Advanced DOM traversal methods for parent, sibling, and child navigation.
Enhanced Text Processing: Built-in regex, cleaning methods, and optimized string operations.
Auto Selector Generation: Generate robust CSS/XPath selectors for any element.
Familiar API: An API similar to Scrapy/BeautifulSoup, using the same pseudo-elements found in Scrapy/Parsel.
Complete Type Coverage: Full type hints for excellent IDE support and code completion.
Ready Docker Image: A Docker image containing all browsers is automatically built and pushed with each release.

Installation

Scrapling requires Python 3.10 or higher.

To install the core parser engine:

pip install scrapling

For fetchers and command-line tools, install optional dependencies:

pip install "scrapling[fetchers]"
scrapling install # Downloads browser dependencies

Other optional features:

AI (MCP server): pip install "scrapling[ai]"
Shell features: pip install "scrapling[shell]"
All features: pip install "scrapling[all]"

Remember to run scrapling install after installing any extras if you haven't already.

Alternatively, use the Docker image with all extras and browsers:

docker pull pyd4vinci/scrapling

Examples

Here are some examples demonstrating Scrapling's capabilities:

Basic Usage with Fetchers and Sessions

from scrapling.fetchers import Fetcher, StealthyFetcher, DynamicFetcher
from scrapling.fetchers import FetcherSession, StealthySession, DynamicSession

# HTTP requests with session support
with FetcherSession(impersonate='chrome') as session: # Use latest version of Chrome's TLS fingerprint
    page = session.get('https://quotes.toscrape.com/', stealthy_headers=True)
    quotes = page.css('.quote .text::text')
    print(f"Quotes from FetcherSession: {quotes}")

# Advanced stealth mode (Keep the browser open until you finish)
with StealthySession(headless=True, solve_cloudflare=True) as session:
    page = session.fetch('https://nopecha.com/demo/cloudflare', google_search=False)
    data = page.css('#padded_content a')
    print(f"Data from StealthySession: {data}")
    
# Full browser automation (Keep the browser open until you finish)
with DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:
    page = session.fetch('https://quotes.toscrape.com/', load_dom=False)
    data = page.xpath('//span[@class=\"text\"]/text()') # XPath selector if you prefer it
    print(f"Data from DynamicSession: {data}")

Advanced Parsing & Navigation

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://quotes.toscrape.com/')

# Get quotes with multiple selection methods
quotes_css = page.css('.quote') # CSS selector
quotes_xpath = page.xpath('//div[@class=\"quote\"]') # XPath
quotes_find_all = page.find_all('div', class_='quote') # BeautifulSoup-style

print(f"First quote text (CSS): {quotes_css.css_first('.text::text')}")

# Advanced navigation
first_quote = page.css_first('.quote')
author = first_quote.next_sibling.css('.author::text')
print(f"Author of first quote: {author}")

# Element relationships and similarity
similar_elements = first_quote.find_similar()
print(f"Found {len(similar_elements)} similar elements to the first quote.")

CLI Usage

Scrapling also provides a powerful command-line interface:

# Launch interactive Web Scraping shell
scrapling shell

# Extract content to a file
scrapling extract get 'https://example.com' content.md --css-selector '#fromSkipToProducts' --impersonate 'chrome'