From: Olaf Kirch <olaf.kirch@oracle.com>
Subject: Avoid multi-page allocations in IP fragmentation

This patch is based on a patch by Zach Brown. The idea is to avoid
multi-order allocations in the fragment handling code, because that
will fail on heavily loaded machines where memory tends to become
rather fragmented. Original posting can be found here:

http://marc.info/?l=linux-netdev&m=114425947024500&w=2

This modified patch addresses a problem encountered on ppc - the
original patch caused skb_shared_info to become unaligned, which causes
crashes on ppc (see bug #6140918). The first iteration introduced a new
problem (described in bug #6845794).

Olaf Kirch <olaf.kirch@oracle.com>
---
--- linux-2.6.9/net/ipv4/ip_output.c.orig	2008-06-13 11:22:17.286367000 -0700
+++ linux-2.6.9/net/ipv4/ip_output.c	2008-06-13 12:06:16.516396000 -0700
@@ -825,32 +825,46 @@ alloc_new_skb:
 			datalen = length + fraggap;
 			if (datalen > mtu - fragheaderlen)
 				datalen = maxfraglen - fragheaderlen;
-			fraglen = datalen + fragheaderlen;
 
-			if ((flags & MSG_MORE) && 
-			    !(rt->u.dst.dev->features&NETIF_F_SG))
-				alloclen = mtu;
-			else
-				alloclen = datalen + fragheaderlen;
+			alloclen = fragheaderlen + hh_len + 15;
 
 			/* The last fragment gets additional space at tail.
 			 * Note, with MSG_MORE we overallocate on fragments,
 			 * because we have no idea what fragment will be
 			 * the last.
 			 */
-			if (datalen == length)
+			if (datalen == length + fraggap)
 				alloclen += rt->u.dst.trailer_len;
+			if ((rt->u.dst.dev->features&NETIF_F_SG) &&
+			    datalen > SKB_MAX_ORDER(alloclen, 0)) {
+				/* If we added a trailer, we have to remove
+				 * it again.
+				 * However, this may actually increase
+				 * SKB_MAX_ORDER(alloclen) signficantly -
+				 * ie by SMP_CACHE_BYTES - so we need to make
+				 * sure we don't accidentally increase datalen.
+				 */
+				if (datalen == length + fraggap)
+					alloclen -= rt->u.dst.trailer_len;
+				datalen = min_t(unsigned int, datalen,
+						SKB_MAX_ORDER(alloclen, 0));
+			}
+
+			fraglen = datalen + fragheaderlen;
+
+			if ((flags & MSG_MORE) &&
+			   !(rt->u.dst.dev->features&NETIF_F_SG))
+			       alloclen += mtu - fragheaderlen;
+			else
+			       alloclen += datalen;
 
 			if (transhdrlen) {
-				skb = sock_alloc_send_skb(sk, 
-						alloclen + hh_len + 15,
+				skb = sock_alloc_send_skb(sk, alloclen,
 						(flags & MSG_DONTWAIT), &err);
 			} else {
 				skb = NULL;
-				if (atomic_read(&sk->sk_wmem_alloc) <=
-				    2 * sk->sk_sndbuf)
-					skb = sock_wmalloc(sk, 
-							   alloclen + hh_len + 15, 1,
+				if (atomic_read(&sk->sk_wmem_alloc) <= 2 * sk->sk_sndbuf)
+				       skb = sock_wmalloc(sk, alloclen, 1,
 							   sk->sk_allocation);
 				if (unlikely(skb == NULL))
 					err = -ENOBUFS;
