Web Scraping with PHP



Web scraping is a collection of practices and techniques to simulate the behavior of a normal web site user in order to effectively use the web site itself as a web service. This can include both retrieving data made available by the site and well as introducing new data into the site. This presentation will define web scraping and showcase recommended practices and common issues and solutions.


This presentation will review basics of the HTTP protocol and how to apply that knowledge by using several well-known PHP HTTP client libraries. It will also detail several extensions available for analysis of retrieved data including PHP’s various XML extensions as well as its tidy and PCRE extensions. Lastly, best practices will be covered including considerations of real-time versus batch processing, implementation of anti-throttling measures, and compliance with the robots.txt standard.

Speaking experience