AJAX - The Complete Reference

PART III

Chapter 10: Web Services and Beyond 507

Finally, we print out the resulting nodes to our own special results page without ads and other items:

/* print out the tags found */ print "<ul>"; foreach ($nodes as $node) { $resultURL = $node->getAttribute('href'); if ($resultURL != '') echo "<li><a href='$resultURL'>$resultURL</a></li>"; } print "</ul>";

?>

We can see the working result here:

Note we don’t give a URL for you to try because, frankly, the demo is likely to fail sometime in the very near future, especially if Google changes its markup structure or they ban us from querying too much. Scraping is fragile, and scraping grab content or mash-up data that is not to be used without surrounding context is certainly bad practice. However, the technology itself is fundamental to the Web. We need to be able to automate Web access, for how else would Web testing tools work? We present the idea only to let you know that scraping might be a necessary evil to accomplish your goals in some situations. If after reading this you are concerned about scraping against your own site, the primary defense for form-based input would be a CAPTCHA (http://en.wikipedia.org/ wiki/Captcha) system, as shown here, where the user types the word shown into some text box for access:

AJAX - The Complete Reference

PART III

Get our desktop app

Company

Features

Documentation

Resources