Web Scraping with PHP*
Web scraping is a collection of practices and techniques to simulate the behavior of a normal web site user in order to effectively use the web site itself as a web service. This can include both retrieving data made available by the site and well as introducing new data into the site. This presentation will define web scraping and showcase recommended practices and common issues and solutions.
This presentation will review basics of the HTTP protocol and how to apply that knowledge by using several well-known PHP HTTP client libraries. It will also detail several extensions available for analysis of retrieved data including PHP’s various XML extensions as well as its tidy and PCRE extensions. Lastly, best practices will be covered including considerations of real-time versus batch processing, implementation of anti-throttling measures, and compliance with the robots.txt standard.
Blue Parabola, LLC
Matthew Turland lives in Duson, LA and is currently employed as Senior Consultant for Blue Parabola. He contributes to open source projects such as Zend Framework and currently holds ZCE and ZFCE certifications. He can be found frequently in the #phpc channel and several others on the Freenode IRC network under the nick Elazar. In his spare time, he serves as an author and technical editor for php|architect magazine and shares his development experiences on his blog at http://ishouldbecoding.com.