Roach PHP🐴">
Getting Started » Roach PHP
Introduction
Getting Started
Installation
Releases
Upgrade Guide
Scraping versus Crawling
Basic Concepts
Spiders
Processing Responses
Items
Item Pipeline
Interactive Shell
Advanced Usage
Spider Middleware
Downloader Middleware
Extensions
Configuring Middleware and Extensions
Dependency Injection
Testing
Framework Integration
Laravel
Symfony
Getting Started
Roach PHP
The complete webscraping toolkit for PHP
Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python.
Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP.
Framework Agnostic
Roach doesn’t depend on a specific framework. Instead, you can use the core package on its own or install one of the framework-specific adapters. Currently there’s a first-party adapter available to use Roach in your Laravel projects with more coming.
Built With Extensibility in Mind
Roach is built from the ground up with extensibility in mind. In fact, most of Roach’s built-in behavior works the exact same way that any custom extensions or middleware works.
Want to store the scraped information in your persistence of choice? Roach has got you covered, just write an appropriate item processor.
Want to add custom HTTP headers to every outgoing request based on some condition? Sure thing, sounds like a job for a downloader middleware.
Post a message into the company Slack after a run was finished to gloat about how great your spider works? I... guess you could write an extension for that and listen on the corresponding event.
On this page<br>Framework Agnostic<br>Built With Extensibility in Mind