[e2e] Are Packet Trains / Packet Bursts a Problem in TCP?

Wed Sep 27 16:48:17 PDT 2006

At Wednesday 27/09/2006 16:09 -0400, David P. Reed wrote:
>Fred - reasona ble response, but I take issue 
>with your statement that the goal is "full use 
>ot the bottleneck without overburdening it".
>Little's Lemma would suggest that idea is unreasonable.

and yet, "full use of the bottleneck without 
overburdening it" is the ideal goal of satellite 
communications via geostationary orbit, and of deep space comms.

It's just an expression of the desire for efficient link utilization.

L.

>In the OS community, we used to have a rule of 
>thumb that operating above 50% of the service 
>rate of almost any multiplexed resource is 
>problematic, unless you have a very precise and 
>completely predictable and constant load.  And 
>Odlyzko has shown that enterprises are happy to 
>run at about 10% capacity in return for lowering 
>other, much more significant costs (such as jitter, latency, unreliability).
>
>Perhaps the real message is that obsession with 
>the fantasy that such a goal is achievable has 
>led to incredibly silly research efforts like the I2 hotrod compettions.
>
>
>Fred Baker wrote:
>>
>>On Sep 26, 2006, at 10:51 AM, Detlef Bosau wrote:
>>
>>>But what I´m curious about in all (sender) 
>>>pacing approaches is: Where does the pacing 
>>>rate stem from? Typically, it´s obtained from 
>>>one of the numerous TCP formulae (Padhye, 
>>>Mathis...), but these are first (admittedly 
>>>rough) estimates and second need some 
>>>measurements to have an initial value set.
>>>
>>>Moreover, widely deployed pacing would 
>>>practically replace the currently used AIMD 
>>>mechanism for achieving a fair share, because 
>>>the currently used AIMD probing leads to 
>>>"microbursts", which means that the TCP sender 
>>>_exceeds_ its fair share of rate for short 
>>>periods of time. This doesn´t matter as TCP is 
>>>not rate controlled.  And the final rate is corrected by ACK clocking.
>>>How does this work with sender pacing, i.e. putting a leaky bucket
>>>or flow shaper or something like that in the TCP sender?
>>
>>
>>The big problem in pacing is determining the 
>>rate available at the bottleneck.
>>
>>Let's assume a truly perfect case - the 
>>bottleneck in the network has a total capacity 
>>of N bps, and currently there are M sessions 
>>using it. The M sessions, by magic, are 
>>operating in perfect TDM fashion - as the last 
>>bit of one message departs, the first bit of 
>>the next message becomes ready to send, and no queue ever forms.
>>
>>Is there room for one more bit on the wire? 
>>Well, if this was a TDM network, no; something 
>>would get dropped. But it's not. So we have a 
>>new sender that decides to start a TCP session 
>>and sends a SYN. What happens is that his 
>>packet collides with some other packet, and a 
>>queue one packet deep results. The new session 
>>gets very little information about rate out of 
>>that - there is clearly capacity for at least 
>>one SYN per RTT available, but he gets no idea 
>>what he is competing with. But the other M 
>>sessions get a teensy bit of information - 
>>their Acks get delayed a few nanoseconds, and 
>>as a result they send their next data a few 
>>nanoseconds later. Their average RTT becomes 
>>some function of sizeof(SYN)/M longer, and as a 
>>result they each slow down if ever so slightly.
>>
>>Having no information about all this, the new 
>>session probes further, sending one or more 
>>data segments. Let's assume that it sends 
>>several (if it doesn't do so in the next RTT, 
>>it will do so in some subsequent one), and that 
>>the receiver replies with two or more Acks. The 
>>sender can infer that the capacity at the bottleneck is no smaller than
>>
>>            Bits acknowledged only by second Ack
>>      -------------------------------------------------
>>      Interval between arrivals of first and second Ack
>>
>>If its K segments were transmitted by the 
>>bottleneck back to back, this would be a 
>>measure of the rate of the bottleneck (Keshev). 
>>If packets from some other segments were 
>>interleaved, this would give the rate at the 
>>bottleneck less the capacity used by those 
>>interleaving segments. It won't EXCEED the rate 
>>at the bottleneck, but in this example it 
>>seriously overestimates the capacity unused by 
>>other sessions and therefore available to the new session at the bottleneck.
>>
>>Let's assume that it picked some rate and paced 
>>traffic to that rate. It could do that 
>>calculation on every Ack it received, and feed 
>>those into the moving average. There are a 
>>couple of possible scenarios, the most benign 
>>of which include "the other sessions all slow 
>>down so that this traffic all goes through at 
>>that rate" and "all other sessions cease, so 
>>that the bottleneck is ONLY carrying this 
>>session's traffic". In both of these relatively 
>>benign cases, the above calculation says that 
>>the rate available is the same number. From the 
>>sender's perspective, is that because that is 
>>the rate available at the bottleneck? Or is it 
>>because that is the rate at which it sent its 
>>probing traffic? Should it probe at a higher rate? If so, at what rate?
>>
>>So the new session has to decide to what extent 
>>it believes this new pice of data. Maybe it 
>>simply jumps to that new rate. I'll leave the 
>>analysis to the reader. Or may it includes that 
>>new rate estimate in some form of moving 
>>average. If it does things just right, it will 
>>ramp up its speed at the rate that the sum  of 
>>the competing sessions slow down, and all will 
>>be cool. Anyone care to lay odds that it would be "just right"?
>>
>>Now, change the scenario. The bottleneck is 
>>completely unused. The new session comes up, 
>>and is trying to infer what capacity is 
>>available, following the procedures it followed 
>>to get things "just right". Guess what? It 
>>asymptotically approaches fully using the link, 
>>but takes one heck of a long time to get there.
>>
>>The big problem in pacing is determining the 
>>rate available at the bottleneck. And by the 
>>way, bottlenecks would have to have heap big 
>>magic to operate in anything resembling a TDM fashion.
>>
>>Operational experience in the ATM world gives 
>>us another side of this. If you give an ATM VC 
>>a certain rate and engineer that rate through 
>>the network, ATM works a lot like TDM works - 
>>it uses the rate authorized, and doesn't 
>>over-use it. If you screw up the engineering, ATM acts like it is screwed up.
>>
>>Hence, I think that what the research, logic, 
>>and operational experience tell us is that 
>>while pacing can be useful to break down big 
>>bursts, in the end pacing doesn't achieve a 
>>what the Internet community has been trying to 
>>achieve for a long time, which is full use of 
>>the bottleneck without overburdening it. We can 
>>achieve engineered use, like ATM, or we can 
>>from time to time over-burden a bottleneck, but we can't avoid both.
>>
>>So I will argue that using pacing for events, 
>>like spreading out the effect of a sequence of 
>>lost Acks, can be helpful, but in the end  the 
>>procedures we use in New Reno etc are superior 
>>in the general case. What we can do to improve 
>>those procedures might be related to RCP or to 
>>some of the work that is being done on 
>>other-than-loss-triggered procedures; we'll 
>>see. But pacing is not a silver bullet.
>>
>