[e2e] Are Packet Trains / Packet Bursts a Problem in TCP?

Wed Sep 27 13:49:14 PDT 2006

I think we're discussing different subjects. Yes, ISPs and large  
enterprises (and anyone that runs a LAN) routinely run their networks  
at a relatively low utilization, for a variety of reasons. The  
subject of this thread has to do with end station expectations and  
the behavior of TCPs, and specifically whether burst behavior is a  
problem and whether pacing is helpful. I'll refer you to Nandita  
Dukkipati and Nick McKeown's arguments (supported by a variety of  
factors stretching from IBM's 1970's fixation on subsecond response  
times to transactions to present discussions of what makes a good web  
site, a large part of which could be described in similar terms, and  
by the way, note the popularity of bittorrent, configurations of web  
browsers that move multiple files in in parallel, etc) that end users  
at the end of the day prefer shorter transaction times to longer ones.

Yes, the network wants significant headroom on average. The end user  
has no motivation to preserve that headroom, however. Faced with the  
choice of moving a file in one second or ten, the user will pick 50  
milliseconds, hands down and every time. So the individual TCP is  
interested in maximizing throughput rate while minimizing loss.

On Sep 27, 2006, at 1:09 PM, David P. Reed wrote:

> Fred - reasona ble response, but I take issue with your statement  
> that the goal is "full use ot the bottleneck without overburdening  
> it".   Little's Lemma would suggest that idea is unreasonable.
>
> In the OS community, we used to have a rule of thumb that operating  
> above 50% of the service rate of almost any multiplexed resource is  
> problematic, unless you have a very precise and completely  
> predictable and constant load.  And Odlyzko has shown that  
> enterprises are happy to run at about 10% capacity in return for  
> lowering other, much more significant costs (such as jitter,  
> latency, unreliability).
>
> Perhaps the real message is that obsession with the fantasy that  
> such a goal is achievable has led to incredibly silly research  
> efforts like the I2 hotrod compettions.
>
>
> Fred Baker wrote:
>>
>> On Sep 26, 2006, at 10:51 AM, Detlef Bosau wrote:
>>
>>> But what I´m curious about in all (sender) pacing approaches is:  
>>> Where does the pacing rate stem from? Typically, it´s obtained  
>>> from one of the numerous TCP formulae (Padhye, Mathis...), but  
>>> these are first (admittedly rough) estimates and second need some  
>>> measurements to have an initial value set.
>>>
>>> Moreover, widely deployed pacing would practically replace the  
>>> currently used AIMD mechanism for achieving a fair share, because  
>>> the currently used AIMD probing leads to "microbursts", which  
>>> means that the TCP sender _exceeds_ its fair share of rate for  
>>> short periods of time. This doesn´t matter as TCP is not rate  
>>> controlled.  And the final rate is corrected by ACK clocking.
>>> How does this work with sender pacing, i.e. putting a leaky  
>>> bucket  or flow shaper or something like that in the TCP sender?
>>
>>
>> The big problem in pacing is determining the rate available at the  
>> bottleneck.
>>
>> Let's assume a truly perfect case - the bottleneck in the network  
>> has a total capacity of N bps, and currently there are M sessions  
>> using it. The M sessions, by magic, are operating in perfect TDM  
>> fashion - as the last bit of one message departs, the first bit of  
>> the next message becomes ready to send, and no queue ever forms.
>>
>> Is there room for one more bit on the wire? Well, if this was a  
>> TDM network, no; something would get dropped. But it's not. So we  
>> have a new sender that decides to start a TCP session and sends a  
>> SYN. What happens is that his packet collides with some other  
>> packet, and a queue one packet deep results. The new session gets  
>> very little information about rate out of that - there is clearly  
>> capacity for at least one SYN per RTT available, but he gets no  
>> idea what he is competing with. But the other M sessions get a  
>> teensy bit of information - their Acks get delayed a few  
>> nanoseconds, and as a result they send their next data a few  
>> nanoseconds later. Their average RTT becomes some function of  
>> sizeof(SYN)/M longer, and as a result they each slow down if ever  
>> so slightly.
>>
>> Having no information about all this, the new session probes  
>> further, sending one or more data segments. Let's assume that it  
>> sends several (if it doesn't do so in the next RTT, it will do so  
>> in some subsequent one), and that the receiver replies with two or  
>> more Acks. The sender can infer that the capacity at the  
>> bottleneck is no smaller than
>>
>>            Bits acknowledged only by second Ack
>>      -------------------------------------------------
>>      Interval between arrivals of first and second Ack
>>
>> If its K segments were transmitted by the bottleneck back to back,  
>> this would be a measure of the rate of the bottleneck (Keshev). If  
>> packets from some other segments were interleaved, this would give  
>> the rate at the bottleneck less the capacity used by those  
>> interleaving segments. It won't EXCEED the rate at the bottleneck,  
>> but in this example it seriously overestimates the capacity unused  
>> by other sessions and therefore available to the new session at  
>> the bottleneck.
>>
>> Let's assume that it picked some rate and paced traffic to that  
>> rate. It could do that calculation on every Ack it received, and  
>> feed those into the moving average. There are a couple of possible  
>> scenarios, the most benign of which include "the other sessions  
>> all slow down so that this traffic all goes through at that rate"  
>> and "all other sessions cease, so that the bottleneck is ONLY  
>> carrying this session's traffic". In both of these relatively  
>> benign cases, the above calculation says that the rate available  
>> is the same number. From the sender's perspective, is that because  
>> that is the rate available at the bottleneck? Or is it because  
>> that is the rate at which it sent its probing traffic? Should it  
>> probe at a higher rate? If so, at what rate?
>>
>> So the new session has to decide to what extent it believes this  
>> new pice of data. Maybe it simply jumps to that new rate. I'll  
>> leave the analysis to the reader. Or may it includes that new rate  
>> estimate in some form of moving average. If it does things just  
>> right, it will ramp up its speed at the rate that the sum  of the  
>> competing sessions slow down, and all will be cool. Anyone care to  
>> lay odds that it would be "just right"?
>>
>> Now, change the scenario. The bottleneck is completely unused. The  
>> new session comes up, and is trying to infer what capacity is  
>> available, following the procedures it followed to get things  
>> "just right". Guess what? It asymptotically approaches fully using  
>> the link, but takes one heck of a long time to get there.
>>
>> The big problem in pacing is determining the rate available at the  
>> bottleneck. And by the way, bottlenecks would have to have heap  
>> big magic to operate in anything resembling a TDM fashion.
>>
>> Operational experience in the ATM world gives us another side of  
>> this. If you give an ATM VC a certain rate and engineer that rate  
>> through the network, ATM works a lot like TDM works - it uses the  
>> rate authorized, and doesn't over-use it. If you screw up the  
>> engineering, ATM acts like it is screwed up.
>>
>> Hence, I think that what the research, logic, and operational  
>> experience tell us is that while pacing can be useful to break  
>> down big bursts, in the end pacing doesn't achieve a what the  
>> Internet community has been trying to achieve for a long time,  
>> which is full use of the bottleneck without overburdening it. We  
>> can achieve engineered use, like ATM, or we can from time to time  
>> over-burden a bottleneck, but we can't avoid both.
>>
>> So I will argue that using pacing for events, like spreading out  
>> the effect of a sequence of lost Acks, can be helpful, but in the  
>> end  the procedures we use in New Reno etc are superior in the  
>> general case. What we can do to improve those procedures might be  
>> related to RCP or to some of the work that is being done on other- 
>> than-loss-triggered procedures; we'll see. But pacing is not a  
>> silver bullet.
>>
>>