Roach PHP – The complete webscraping toolkit for PHP

ms78921 pts0 comments

Roach PHP🐴">

Getting Started » Roach PHP

Introduction

Getting Started

Installation

Releases

Upgrade Guide

Scraping versus Crawling

Basic Concepts

Spiders

Processing Responses

Items

Item Pipeline

Interactive Shell

Advanced Usage

Spider Middleware

Downloader Middleware

Extensions

Configuring Middleware and Extensions

Dependency Injection

Testing

Framework Integration

Laravel

Symfony

Getting Started

Roach PHP

The complete webscraping toolkit for PHP

Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python.

Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP.

Framework Agnostic

Roach doesn’t depend on a specific framework. Instead, you can use the core package on its own or install one of the framework-specific adapters. Currently there’s a first-party adapter available to use Roach in your Laravel projects with more coming.

Built With Extensibility in Mind

Roach is built from the ground up with extensibility in mind. In fact, most of Roach’s built-in behavior works the exact same way that any custom extensions or middleware works.

Want to store the scraped information in your persistence of choice? Roach has got you covered, just write an appropriate item processor.

Want to add custom HTTP headers to every outgoing request based on some condition? Sure thing, sounds like a job for a downloader middleware.

Post a message into the company Slack after a run was finished to gloat about how great your spider works? I... guess you could write an extension for that and listen on the corresponding event.

On this page<br>Framework Agnostic<br>Built With Extensibility in Mind

roach middleware framework built complete toolkit

Related Articles