[e2e] congestion collapse definition
David P. Reed
dpreed at reed.com
Tue Sep 8 15:05:05 PDT 2009
Folks, I tend not to use the term "congestion collapse", though it is in
common use in the Internet community.
The phenomenon I've been experiencing on the other thread about AT&T 3G
data access network configuration on this list, if I'm correct (as I'm
pretty sure I am) should probably be called "congestion collapse", or
else we need a new term.
The phenomenon observed in Comcast's debacle with DOCSIS upstream
buffering should be called by the same term - again, buffering is
allowed to build on a shared queue carrying diverse traffic, without
providing any feedback that can be recognized by TCP's rate control
loop, leading to positive feedback and uncontrolled delay.
If I look at Wikipedia, for example, at the definition of congestion
collapse there, it says that CC is characterized by large buffering
delays AND lost packets. However, in the Comcast and ATT cases here,
the queues get so obnoxiously long (5-10 seconds) that users presumably
give up running apps long before packet *loss* sets in due to overflow.
This appears to be because all the TCP stacks are doing their job: new
connections slow-start, then AI accelerates at a rate that is gradual
enough (and over short-enough connections) that the huge buffers can
stabilize at the point where human pain is the congestion control algorithm.
Human pain was the load control algorithm in early overloaded
TimeSharingSystems. On the original Multics system, people realized
that in the middle of the day it was *foolish* to start a program that
ran more than one second, because priority given to line editors over
compute jobs meant that compute jobs would NEVER complete (unless one
did an obscure thing called "quit-starting" the program to interact once
a second by stopping and restarting the compile - some hackers rigged up
terminals to automatically send interrupt/restart commands once per
second to get their work done, but the rest of us coders worked mostly
between 11 pm and 6 am).
Of course another part of fixing ATT's problem is to fix the *upstream*
capacity of the network. The bottleneck wouldn't occur if the output
queue of the bottleneck router could drain as fast as users can generate
Back to my question: should this phenomenon be included in "congestion
collapse" (I believe so), or should we invent a new more specific name
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the end2end-interest