39
Chapter 3
TCP
The Transmission Control Protocol (officially TCP/IP but referred to as TCP throughout the rest of this book) is the
workhorse of the Internet. First defined in 1974, it builds upon the packet transmission technology of the Internet
Protocol (IP, described in Chapter 1) to let applications communicate using continuous streams of data. Unless a
connection dies or freezes because of a network problem, TCP guarantees that the data stream will arrive intact,
without any information lost, duplicated, or out of order.
Protocols that carry documents and files nearly always ride atop TCP. This includes the delivery of web pages to
your browser, file transmission, and all of the major mechanisms for transmitting e-mail. TCP is also the foundation
of choice for protocols that carry on long conversations between people or computers, such as SSH terminal sessions
and many popular chat protocols.
When the Internet was younger, it was sometimes tempting to try to squeeze a little more performance out of
a network by building an application atop UDP (see Chapter 2) and carefully choosing the size and timing of each
individual datagram yourself. But modern TCP implementations tend to be sophisticated, having benefited from
more than 30 years of improvement, innovation, and research. It is rare that anyone but an expert in protocol design
can improve upon the performance of a modern TCP stack. These days, even performance-critical applications like
message queues (Chapter 8) usually choose TCP as their medium.
How TCP Works
As you learned in Chapters 1 and 2, networks are fickle creatures. They sometimes drop the packets you try to transmit
across them. They occasionally create extra copies of a packet. Plus, they often deliver packets out of order. With a bare
datagram facility like UDP, your own application code has to worry about whether each datagram arrives and have
a plan for recovering if it does not. But with TCP, the packets themselves are hidden beneath the protocol, and your
application can simply stream data toward its destination, confident that lost information will be retransmitted until
it finally arrives successfully.
The classic definition of TCP/IP is RFC 793 from 1981, though many subsequent RFCs have detailed extensions
and improvements.
How does TCP provide a reliable connection? Here are its basic tenets:
• Every TCP packet is given a sequence number so that the system on the receiving end can put
them back together in the right order and can also notice missing packets in the sequence
and ask that they be retransmitted.
• Instead of using sequential integers (1, 2, 3...) to sequence packets, TCP uses a counter that
counts the number of bytes transmitted. A 1,024-byte packet with a sequence number of 7,200,
for example, would be followed by a packet with a sequence number of 8,224. This means that
a busy network stack does not have to remember how it broke up a data stream into packets.
If asked for a retransmission, it can break up the stream into new packets some other way
(which might let it fit more data into a packet if more bytes are now waiting for transmission),
and the receiver can still put the packets back together.