AJAX - The Complete Reference

(avery) #1

PART III


Chapter 10: Web Services and Beyond 507


Finally, we print out the resulting nodes to our own special results page without ads
and other items:

/* print out the tags found */
print "<ul>";
foreach ($nodes as $node)
{
$resultURL = $node->getAttribute('href');
if ($resultURL != '')
echo "<li><a href='$resultURL'>$resultURL</a></li>";
}
print "</ul>";

?>

We can see the working result here:

Note we don’t give a URL for you to try because, frankly, the demo is likely to fail
sometime in the very near future, especially if Google changes its markup structure or they
ban us from querying too much.
Scraping is fragile, and scraping grab content or mash-up data that is not to be used
without surrounding context is certainly bad practice. However, the technology itself is
fundamental to the Web. We need to be able to automate Web access, for how else would
Web testing tools work? We present the idea only to let you know that scraping might be a
necessary evil to accomplish your goals in some situations.
If after reading this you are concerned about scraping
against your own site, the primary defense for form-based
input would be a CAPTCHA (http://en.wikipedia.org/
wiki/Captcha) system, as shown here, where the user types
the word shown into some text box for access:
Free download pdf