Learning Python Network Programming

(Sean Pound) #1

APIs in Action


If we navigated to this element by using the ElementTree functions, which we have
used before, then we'd end up with something like the following:





root.find('body').findall('div')[1].find('p').text





Debian 8.0 was.


But this isn't the best approach, as it depends quite heavily on the HTML structure.
A change, such as a

tag being inserted before the one that we needed, would
break it. Also, in more complex documents, this can lead to horrendous chains of
method calls, which are hard to maintain. Our use of the tag in the previous<br /> section to get the codename is an example of a good technique, because there is<br /> always only one <head> and one <title> tag in a document. A better approach to<br /> finding our <div> would be to make use of the id="content" attribute it contains.<br /> It's a common web page design pattern to break a page into a few top-level <divs><br /> for the major page sections like the header, the footer and the content, and to give the</p><br /> <divs> id attributes which identify them as such.<br /> <p>Hence, if we could search for <div>s with an id attribute of "content", then<br /> we'd have a clean way of selecting the right <div>. There is only one <div> in the<br /> document that is a match, and it's unlikely that another<div> like that will be added<br /> to the document. This approach doesn't depend on the document structure, and so<br /> it won't be affected by any changes that are made to the structure. We'll still need<br /> to rely on the fact that the <p> tag in the <div> is the first <p> tag that appears, but<br /> given that there is no other way to identify it, this is the best we can do.</p><br /> <p>So, how do we run such a search for our content <div>?</p><br /> <h3>Searching with XPath</h3><br /> <p>In order to avoid exhaustive iteration and the checking of every element, we need<br /> to use XPath, which is more powerful than what we've used so far. It is a query<br /> language that was developed specifically for XML, and it's supported by lxml. Plus,<br /> the standard library implementation provides limited support for it.</p><br /> <p>We're going to take a quick look at XPath, and in the process we will find the answer<br /> to the question posed earlier.</p><br /> <p>To get started, use the Python shell from the last section, and do the following:</p><br /> <blockquote><br /> <blockquote><br /> <blockquote><br /> <p>root.xpath('body')</p><br /> </blockquote><br /> </blockquote><br /> </blockquote><br /> <p>[<Element body at 0x39e0908>]</p> </div> <meta itemprop='headline' content="p 131: Searching with XPath - Learning Python Network Programming - free download pdf - issuhub"> </div> <div role="navigation" itemscope itemtype="http://schema.org/SiteNavigationElement"> <span itemprop="url"><b><a href="/view/index?id=3992&pageIndex=130" rel="previous" itemprop="name">← Previous</a></b></span> <span itemprop="url" class="mx-3"><b><a href="/view/index?id=3992&pageIndex=132" rel="next" itemprop="name">Next →</a></b></span> </div> <div style=" text-align: center; margin: 20px auto; padding: 13px; width: 240px; font-size: 20px; "> <a class="page-link" style="background-color: #72bf86;" target="_blank" href="/view/index?id=3992&pageIndex=130#bookdownload" title="Free download pdf" >Free download pdf</a> </div> </div> <div class="footer"> <div class="container"> <div class="row"> <div class="col-lg-3 ml-lg-auto mb-5 mb-lg-0"> <div class="mb-4"> <h5 class="text-dark">Get our desktop app</h5> </div> <a class="btn btn-icon btn-indigo rounded-circle mr-2" target="_blank" href="/download/issuhub.dmg"> <i class="fa fa-apple"></i> </a> <a class="btn btn-icon btn-indigo rounded-circle" target="_blank" href="/download/issuhub.exe"> <i class="fa fa-windows"></i> </a> </div> <div class="col-6 col-md-3 col-lg mb-5 mb-lg-0"> <h5 class="text-dark">Company</h5> <!-- Nav Link --> <ul class="list-unstyled mb-0"> <li class="my-2"><a href="/about">About</a></li> <li class="my-2"><a href="/contact">Contact</a></li> <li class="my-2"><a href="/news/index">News</a></li> </ul> <!-- End Nav Link --> </div> <div class="col-6 col-md-3 col-lg mb-5 mb-lg-0"> <h5 class="text-dark">Features</h5> <!-- Nav Link --> <ul class="list-unstyled mb-0"> <li class="my-2"><a href="/quick">Quick Start</a></li> <li class="my-2"><a href="/desktop">Desktop</a></li> <li class="my-2"><a href="/editor-help">Editor</a></li> </ul> <!-- End Nav Link --> </div> <div class="col-6 col-md-3 col-lg"> <h5 class="text-dark">Documentation</h5> <!-- Nav Link --> <ul class="list-unstyled mb-0"> <li class="my-2"><a href="/support/index">Support</a></li> <li class="my-2"><a href="/site/pricing">Pricing</a></li> </ul> <!-- End Nav Link --> </div> <div class="col-6 col-md-3"> <h5 class="text-dark">Resources</h5> <!-- Nav Link --> <ul class="list-unstyled mb-0"> <li class="my-2"> <a href="/tutorial" target="_blank"> <span class="media align-items-center"> <i class="fa fa-info-circle mr-2"></i> <span class="media-body">Tutorial</span> </span> </a> </li> <li class="my-2"> <a href="/site/login"> <span class="media align-items-center"> <i class="fa fa-user-circle mr-2"></i> <span class="media-body">Your Account</span> </span> </a> </li> </ul> <!-- End Nav Link --> </div> </div> </div> </div> <div class="footer"> <div class="container"> <div class="row"> <div class="col-md-6 mb-4 mb-md-0"> <!-- Nav Link --> <ul class="nav nav-sm nav-white nav-x-sm align-items-center"> <li class="my-2"> <a href="/privacy">Privacy & Policy</a> </li> <li class=" opacity my-2 mx-3">/</li> <li class="my-2"> <a href="/terms">Terms</a> </li> </ul> <!-- End Nav Link --> </div> <div class="col-md-6 text-md-right"> <ul class="list-inline mb-0"> <!-- Social Networks --> <li class="list-inline-item"> <a class="btn btn-xs btn-icon btn-soft-light" href="https://www.facebook.com/Issuhub-Flipbook-2315543688769343/"> <i class="fa fa-facebook text-dark"></i> </a> </li> <li class="list-inline-item"> <a class="btn btn-xs btn-icon btn-soft-light" href="https://twitter.com/IssuhubBooks"> <i class="fa fa-twitter text-dark"></i> </a> </li> <!-- End Social Networks --> </ul> </div> </div> <!-- Copyright --> <div style="text-align: center;"> <div >© ISSUHUB. 2024. All rights reserved.</div> </div> <!-- End Copyright --> </div> </div> </div> </div> <script src="/assets/6df76c57/assets/js/vendors/jquery-3.2.1.min.js"></script> <script src="/assets/6df76c57/assets/js/vendors/bootstrap.bundle.min.js"></script></body> </html>