Hacking Gmail

(Grace) #1

188 Part III — Conquering Gmail


Listing 13-2 (continued)

<input type=checkbox name=t
value=”101480d8ef5dc74a”>
<img src=”/gmail/images/star_on_2.gif”
width=15 height=15 border=0 alt=Starred>
</td>
<td >

Ben Hammersley</td>
<td >
<a href=”?th=101480d8ef5dc74a&v=c”>
<font size=1><font color=#006633>
Heads
</font></font>
Here’s a nice message.
</a></td>
<td nowrap>Jan 6

If you look at this code, and know what you already do about the way Gmail
works, it’s easy to deduce the structure of the page. Each line of the Inbox is struc-
tured like this:
<tr bgcolor=#E8EEF7>
<td><input type=checkbox name=t value=”THREAD ID”>
A LINK TO A STAR IMAGE IF THE MESSAGE IS STARRED
</td>
<td >THE AUTHOR NAME</td>
<td ><a href=”A RELATIVE LINK TO THE PAGE DISPLAYING THE MAIL”>
<font size=1><font color=#006633>THE LABEL</font></font>
THE SUBJECT LINE
</a></td>
<td nowrap>THE DATE.

And so, to retrieve your Inbox, you simply retrieve this page, walk through the
code until you get to the correct table, collect every instance of the preceding
structure, and parse out the details. This is what you shall do now.

Parsing the Inbox


Listing 13-3 shows some Perl code that uses HTML::TokeParser to walk through
the HTML-only Inbox page that you saved earlier and print out details of the
messages therein. Note that it loads the page as a text file from the disk, and just
Free download pdf