[e2e] GRID Network Research
Jon.Crowcroft at cl.cam.ac.uk
Wed May 8 09:11:08 PDT 2002
the GGF (see usal web site) are interested in inpout on this:
(we will also generate a doc about the 10 things grid people wish
network people knew!)
comments to me!
Outline Draft for:
Top ten things network engineers wish grid programmers knew
Top ten things grid programmers wish network engineers knew
Jon Crowcroft et al...26/4/2002
This is a draft contribution for a document for the GHPNRG
amongst other places)
which is meant to list topics that the network community is
working on and is sometimes asked alarming questions about by
folks who make intensive (and quite well educated) use of
networks, such as Global GRID Forum people.
It is currently a list of topics and references. I might expand
the list, and it certainly needs lots of explanatory text. It
might be neat of this was from the GNT!.
1/ Congestion Control (contrariwise: see QoS)
is this always necessary? no, but beware of ISPs who
mandate it, and if you think you can use less than recent history rather
than recent measurements, look at the Congestion Manager and TCP
PCB state shearing work first!
This is not optional in a non QoS network (which is just
about any network) - adaption is mandatory
AIMD and Equation Based
AIMD is not the only solution to a fair, convergent
control rule for congestion avoidance and control. Other
solution are around - Rate based, using loss, or ECN feedback,
can work to be TCP fair, but not generate the characteristic Saw
Assumptions and errors
Most _connections_ do not behave like the Padhye
equation, but most bytes are shipped on a small number of
connections , and do - c.f. Mice and Elephants.
The jury is still out on whether there are non greedy TCP flows
(ones who do not have infinite sources of data at any moment)
RMT and Unicast
Reliable Multicast Transport protocols (PGM, ALC) use
a variety of techniques to mimic TCP mainly.
Mobile and Congestion Control
Mobile nodes experience temporary indications of
loss AND congestion
during a hand-off. People have proposed mechanisms for
indicating whether these are "true" or chimera.
Economics, Fairness etc
Congestion control results in an approximately fair
distribution of bottleneck bandwidth - this may not be
great if you paid more to get a fat pipe to the net.
But, you are probably nearer the core and have
every right to ask the ISP to upgrade their bottlenecks anyhow
and the people that paid less should be bottlenecked at THEIR
access links in that case. So?
Priorities for good routing system design are:
Packet classification and switched routers have come a
long way recently - we are unlikely in the software world to beat
the h/w in core routers, but we can compete nicely in access
devices - certainly, there is no reason why a small cluster
couldn;t make a good 10Gbps router - but there's every reason why
a PCI bus machine maxes out at 1Gbps!
Routers and links fail. the job of OSPF/ISIS and BGP is
to find the alternate paths quickly - in reality they take a
whole to converge - IGPs take a while (despite being mainly link
state nowadays) because link failure detection is NOT obvious -
sometimes you have to count missed HELLO packets (since some
links don't generate an explicit clock). BGP convergence is a
joke. But there are smart people on the case.
theory and practice
Most the problems with implementing routing protocols are those
of classic distributed (p2p/autonomous) algorithms: dealing with
bugs in other peoples implementations - it takes a good
programmer about 3 months to do a full OSPF. It then takes around
3 years to put in all the defences.
Better (multi-path, multi-metric) routing
equal cost Multipath OSPF and QOSPF
have been dreamt up - are they used a lot? multipath in limited
cases appears to work quite well. Multimetric relies on good
understanding of traffic engineering and economics, and to date,
hasn't seen the light of day. Note that also, in terrestrial tier
one networks, end-to-end delays are approaching transmission
delays, so asking for a delay (or jitter) bound is getting fairly
pointless - asking for a throughput guarantee is a good idea, but
doesn't need clever routing!
Does MPLS Help? No, not one bit.
Policies are hard - BGP allows one to express unilateral
policies to the planet. this is cute (the same idea could be used
for policy management of other resources like CPUs in the GRID)
however, it results in difficulties in computing global choices
(esp Multihoming) - there are fixes.
3/ Packet Sizes
Go faster LANs have always pushed the MTU up - since ATM LANs
(remember the fore asx100) we tried 9280 byte packets, and
enjoyed things. But the GRID is global, so the MTU is that of the
weakest link. Most stuff is on 100BaseT somewhere on the path
so we aren't likely to see more than the occasional special case
non 1500 byte path. However, with path MTU discovery, we get that
Multicast MSS is a real problem:)
Sub-IP packet size is a consideration - some systems (ATM) break
packets into tiny little pieces, then apply various level2
schemes to these pieces (e.g. rate/congestion control) - most
these are anathema to good performance.
Overlays and P2p (e.g. Pastry, CAN, Tapastry)
are becoming commonplace - the routing overlay du jour is
probably RON from MIT - these (at best) are an auto-magic way of
configuring a set of Tunnels (IPinIP, GRE etc). I.e. they build
P2P: are slightly different - they do content sharing and have
cute index/search/replication strategies varying from
mind-numbingly stupid (napster, gnutella) to very cute (CAN,
Pastry). They have problems with
Locality and Metrics so are not the tool for the job for low
latency file access....in trying to mitigate this , they (and
overlay routing substrates) use ping and pathchar to try to find
limitations of Ping/Pathchar
Convergence when not native (errors/confidence)
Peer-to-Peer Harnessing the Power of Disruptive Technologies
Edited by Andy Oram, March 2001, 0-596-00110-X
5/ QoS (contrariwise: see Congestion Control)
QoS - would be a nice thing
Parameters typically include
Some people add security/integrity
Some people also mention loss...
Theft and Denial of Service
Protection is really what people want - If I send x bps
to site S, what y bps will be received, how much d later?
to guarantee y=x, and d is minimised, you need
Admission Control (so we are not sharing as we would if
we adapted under congestion control)
Scheduling (so we do not experience arbitrary queueing
Re-routing may also need to be controlled and pre-empted
alternate routes (also known, unfortunately as protection paths)
may be needed if we want QoS to include availability as well as
throughput guarantees and delay bounds.
"edge", "core", etc is a myth 0- in the global net the
average traffic path includes 7 ASs - most inter-domain traffic
traverses heavily used Internet Exchange points (e.g. London)
where capacity only just about matches demand, whereas core
networks are often "over-provisioned" (UK academic net now runs at
Aggregation is a technique to scale traffic management for QoS -
by only managing classes of aggregates of flows, we get to reduce
the state and signaling/management overhead for it. VPNs/tunnels
of course are aggregation techniques, as are things that treat
packet differently on subfields like DSCP, port etc etc
SLAs are around already despite non widespread QoS - however,
SLAs are only intra-ISP to my knowledge (some Internet Exchanges
offer SLAs but end 2 end SLAs are as scarce as dragons).
Economics - are important here again as you can imagine!
An Engineering Approach to Computer Networking
Keshav, 1997, Addison-Wesley Pub Co; ISBN: 0201634422
Internet QoS: Architectures and Mechanisms for Quality of Service
by Zheng Wang, 2001, Morgan Kaufmann Publishers; ISBN: 1558606084
Tier 1 routing works. Most ISPs run core native multicast
Interdomain only just limps (its getting better...
App Relay Solutions
RMT - we have some candidate protocols for reliable multicast -
nothing as solid as 1988 TCP quite yet tho.
Address Allocation and Directories are not great yet, hence
beacons and so on.
Access Network are in bad shape...e.g.
DSLAMs dont do IGMP snooping
Cable dont do IGMP snooping
Dialup cant hack it at all
Does IPv6 Help (don't laugh!) - yes it might!
Developing IP Multicast Networks: The Definitive Guide to
Designing and Deploying CISCO IP Multicast Networks
by Beau Williamson, 2000, Cisco Press; ISBN: 157870077
Multicast Communication: Protocols, Programming, and Applications
by Ralph Wittmann, Martina Zitterbart
Morgan Kaufmann Publishers; ISBN: 1558606459
7/ Operating Systems
Linux, Solaris etc...there's a lot we could say here - lots of
things can and should be configured
zero copy stack - we'd all like this - zero copy receive is hard;
RDMA is not obviously the answer
Interrupts (self selecting NICs) we should minimises these if we
want TCP to go to 10Gbps on a reasonable processor - there are
socket buffer considerations -there are lots!
protection and scheduling domains - if we could get away from OSs
that confused these , life would be easier!
W Richard Stevens, TCP/IP Illustrated, All Volumes.
Understanding the Linux Kernel,
D.P. Bovet and M. Cesati, O'Reilly, 2001,
8/ Layer 2 Considerations
layer 2 NBMA nets - lots - a pain
layer 2 shared media nets - was decreasing due to switched ether,
now increasing due to wireless.
switching and routing re-cursed - layer 2 switching and routing
usually makes life HARDER for the IP engineer.
flow and congestion control re-cursed - layer 2 reliability and
flow control almost ALWAYS make life worse for the IP and TCP
signaling (implicit, explicit) is just painful.
802.11 - in its glory:
General discussion of slow lossy links:
WAP horrors - see web for many stories
GPRS - see:
Other end of "Spectrum", see
(includes Raj Jain's own list of hot topics!)
9/ Light v. Heavyweight Protocols
Header prediction, Packet templates make
Code complexity a lot lower in the common case even for a big
protocol like TCP or SCTP.
"User space" v. kernel myths - in this authors experience it is
really worth getting people to put transports into the kernel -
reasons include independent failure of application and protocol
as well as good control of end system resources. It ain't that
hard and user space will just almost never be as fast.
Computer Networks, A Systems Approach
Peterson and Davie,
Morgan Kaufmann, 1996, ISBN 1-55860-368-9
(2nd ed. too)
10/ Macroscopic Traffic and System Considerations
Self similarity, so?
traffic is self similar (i.e. arrivals are not i.i.d) -
this doesn't actually matter much (there is a horizon effect)
traffic phase effects
p2p (IP router, multiparty applications etc)
have a tendency (like clocks on a wooden door, or
fireflies in the mekong delta) to synchronise 0- this is
a bad thing
e.g. genome publication of new result followed
by simultaneous dbase search with similar queries from
lots of different places...
Many things in the net are asymmetric - see ADSL lines,
see client-server, master-slave, see most NAT boxes. See BGP
paths. beware - assumptions about symmetry (e.g. deriving 1 way
delay from RTT) are often wildly wrong. Asymmetry also breaks all
kinds of middle box snooping behaviour.
The Art of Computer Systems Performance Analysis
Raj Jain, 1991, Wiley, ISBN 0-471-50336-3
Web Protocols and Practice
B. Krishnamurthy & J. Rexford,
Addison Wesley, 2001, ISBN 0-201-710885
Ross Anderson, 2001 Wiley & Sons; ISBN: 0471389226
ACM CCR 25th Anniversary Edition,
ACM SIGCOMM CCR, Volume 25, No.1 January 1995,
ISSN #: 0146-4833
More information about the end2end-interest