Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 12: Networks


Fragmentation of the packet necessary?

Not enough space for hardware header?

ip_output

ip_finish_output

ip_finish_output2

skp_realloc_headroom

dst->neighbour->output

ip_fragment

Netfilter hook NF_IP_POST_ROUTING

Figure 12-19: Code flow diagram forip_output.

First of all, the netfilter hookNF_IP_POST_ROUTINGis called, followed byip_finish_output.I
first examine the situation in which the packet fits into the MTU of the transmission medium
and need not be fragmented. In this case,ip_finish_output2is directly invoked. The function
checks whether the socket buffer still has enough space for the hardware header to be generated. If
necessary,skb_realloc_headroomadds extra space. To complete transition to the network access
layer, thedst->neighbour->outputfunction set by the routing layer is invoked, normally using
dev_queue_xmit.^21

PacketFragmenting


IP packets are fragmented into smaller units byip_fragment, as shown in Figure 12-20.

IP TCP Payload

IP TCP 1 IP 2 IP 3 IP 4

Figure 12-20: Fragmenting of an IP packet.

IP fragmenting is very straightforward if we ignore the subtleties documented in RFC 791. A data frag-
ment, whose size is compatible with the corresponding MTU, is extracted from the packet in each cycle
of a loop. A new socket buffer, whose old IP header can be reused with a few modifications, is created
to hold the extracted data fragment. A common fragment ID is assigned to all fragments to support
reassembly in the destination system. The sequence of the fragments is established on the basis of the
fragment offset, which is also set appropriately. Themore fragmentsbit must also be set. Only in the last
packetoftheseriesmustthisbitbesetto0.Eachfragmentissentusingip_outputafterip_send_check
has generated a checksum.^22

(^21) The kernel also uses ahard headercache. This holds frequently needed hardware headers that are copied to the start of a packet. If
the cache contains a required entry, it is output using a cache function that is slightly faster thandst->neighbour->output.
(^22) ip_outputis invoked via a function pointer passed toip_fragmentas a parameter. This means, of course, that other send
functions can be selected. The bridging subsystem is the only user of this possibility, and is not discussed in more detail.

Free download pdf