Learning Python Network Programming

(Sean Pound) #1

APIs in Action


Well, yes it would. The ideal set up is to have a main account on which you register
the app, and which you can also use it as a regular Twitter account, and have the app
process tweets for a second dedicated world clock account.


oAuth makes this possible, but there are some extra steps that are needed to get it
to work. We would need the world clock account to authorize our app to act on its
behalf. You'll notice that the oAuth credentials mentioned earlier are comprised
of two main elements, consumer and access. The consumer element identifies our
application, and the access element proves that the account the access credentials
came from authorized our app to act on its behalf. In our app we shortcut the full
account authorization process by having the app act on behalf of the account through
which it was registered, that is, our app account. When we do this, Twitter lets us
acquire the access credentials directly from the dev.twitter.com interface. To use a
different user account, we would have needed to have inserted a step where the user
is taken to Twitter, which would be opened in a web browser, where the user would
have to log in and then explicitly authorize our application.


This process is demonstrated in the requests-oauthlib
documentation, which can be found at https://requests-oauthlib.
readthedocs.org/en/latest/oauth1_workflow.html.

HTML and screen scraping


Although more and more services are offering their data through APIs, when a
service doesn't do this then the only way of getting the data programmatically is to
download its web pages and then parse the HTML source code. This technique is
called screen scraping.


Though it sounds simple enough in principle, screen scraping should be approached
as a last resort. Unlike XML, where the syntax is strictly enforced and data structures
are usually reasonably stable and sometimes even documented, the world of web
page source code is a messy one. It is a fluid place, where the code can change
unexpectedly and in a way that can completely break your script and force you to
rework the parsing logic from scratch.


Still, it is sometimes the only way to get essential data, so we're going to take a brief
look at developing an approach toward scraping. We will discuss ways to reduce the
impact when the HTML code does change.

Free download pdf