183
Chapter 11
The World Wide Web
Chapters 9 and 10 explained the Hypertext Transfer Protocol (HTTP) as a general mechanism by which clients can
request documents and to which servers can respond by providing them.
Something, however, went unexplained. Why does the name of the protocol start with the word hypertext?
The answer is that HTTP was not designed simply as a new way to transfer files. It is not simply a fancy caching
replacement for older file transfer protocols such as FTP (see Chapter 17). While it is certainly capable of delivering
stand-alone documents such as books, images, and video, the purpose of HTTP is a much more ambitious one:
to allow servers all over the world to publish documents that, through mutual cross-references, become a single
interlinked fabric of information.
HTTP was built to deliver the World Wide Web.
Hypermedia and URLs
Books have referenced other books for thousands of years. But a human has to enact each reference by fetching the
other book and turning pages until the referenced text is found. The dream that the World Wide Web (WWW, or simply
“the Web”) has fulfilled is to delegate to the machine the responsibility of resolving the reference.
The moment that inert text like “the discussion of cookies in Chapter 9” becomes underlined and clickable on
a computer screen so that a click takes you to the text that it is referencing, it becomes a hyperlink. Full documents
whose text can contain embedded hyperlinks are called hypertext documents. When images, sound, and video are
added to the mix, the user is experiencing hypermedia.
In each case, the prefix hyper- indicates that the medium itself understands the ways that documents mutually
reference each other and can enact those links for a user. The phrase “see page 103” in a printed book does not, itself,
have the power to carry you to the destination that it describes. The browser displaying a hyperlink, by contrast, does
have this power.
To power hypermedia, the uniform resource locator (URL) was invented. It offers a uniform scheme by which not
only modern hypertext documents but also even old FTP files and Telnet servers can be referenced. You have seen
many such examples in the address bar of your web browser.
Some sample URLs
https://www.python.org/
http://en.wikipedia.org/wiki/Python_(programming_language)
http://localhost:8000/headers
ftp://ssd.jpl.nasa.gov/pub/eph/planets/README.txt
telnet://rainmaker.wunderground.com
The initial label like https or http is the scheme, which names the protocol by which a document can be
retrieved. Following the colon and two slashes :// comes the hostname and optional port number. Finally, a path
selects one particular document out of all the documents that might be available on a service.