<pedrocorreia.net ⁄>
 

<Web scraping with PHP and XPath ⁄ >




clicks: 3095 3095 2009-02-18 2009-02-18 goto programacao myNews programacao  Bookmark This Bookmark This


When I was writing about how I use web scraping, I was still hadn't tried using Xpath (shame on me). sssscripting blog responded to my article with very good and rich post about all sorts of different techniques for scraping (with Ruby examples) and after reading this post in Kore Nordmann blog I finally decided to try making something with Xpath.

It turned out, that using Xpath is extremely easy, really. When you master it, you can do everything in seconds. Yes, you need to know how XML works and how to write correct Xpath queries (brief explanation of Xpath syntax is available at W3Schools), but hey - these topics are in 1st year of university.

Also, there are good tools like XPath checker for Firefox which allows you to debug and test your queries without writing any code. Stupid to say, but XPath queries looks a lot like CSS selectors, but with much more power and flexibility. Without further talking, lets look at example (idea from Kore's article):



este é só um excerto do artigo, para aceder ao artigo completo, clique no link em baixo:
this is just a small excerpt from the article, to access the full article please click in the link below:

http://dev.juokaz.com/php/web-scraping-with-php-and-xpath




Subscribe News RSS  Subscribe News Updates by E-mail





myNews <myNews show="rand" cat="programacao" ⁄>

RouterJs: easy routing for your ajax Web applications new ...

RouterJs is a simple router for your ajax web apps. It's build upon History.js which means that Rout (...)

clicks: 16487 16487 2012-05-14 2012-05-14 goto url (new window) haithembelhaj.g... goto myNews programacao


Backbone computed properties new ...

This gist shows one way to implement read- and write-enabled computed properties on a Backbone Model (...)

clicks: 16332 16332 2012-05-13 2012-05-13 goto url (new window) https://gist.gi... goto myNews programacao


Android Query new ...

Android-Query (AQuery) is a light-weight library for doing asynchronous tasks and manipulating UI el (...)

clicks: 16529 16529 2012-05-12 2012-05-12 goto url (new window) code.google.com... goto myNews programacao


HTML5 jQuery Paint Plugin new ...

Websanova Paint is a HTML5 canvas based jQuery plugin. It allows you to free paint on a canvas area (...)

clicks: 27122 27122 2012-05-12 2012-05-12 goto url (new window) websanova.com/t... goto myNews programacao


Create Instagram Filters With PHP new ...

In this tutorial, I'll demonstrate how to create vintage (just like Instagram does) photos with PHP (...)

clicks: 16027 16027 2012-05-12 2012-05-12 goto url (new window) net.tutsplus.co... goto myNews programacao


Sass vs. LESS vs. Stylus: Preprocessor Shootout new ...

CSS3 preprocessors are languages written for the sole purpose of adding cool, inventive features to (...)

clicks: 15727 15727 2012-05-11 2012-05-11 goto url (new window) net.tutsplus.co... goto myNews programacao


Real-time Applications With Node.js and Socket.IO new ...

Hey everyone! Sorry about the long pause since the last blog post, life has been quite hectic for th (...)

clicks: 16399 16399 2012-05-11 2012-05-11 goto url (new window) codingcookies.c... goto myNews programacao


Gettings to know Backbone.ks new ...

In this series, we're going to learn how to build a fully functional contacts manager using Backbone (...)

clicks: 15285 15285 2012-05-10 2012-05-10 goto url (new window) net.tutsplus.co... goto myNews programacao