[e2e] Re: Are you interested in TOEs and related issues
shalunov at internet2.edu
Thu Mar 4 22:33:53 PST 2004
Sunay Tripathi <Sunay.Tripathi at eng.sun.com> writes:
> 1) Extra processor(s) buried in the TOE for networking processing which is
> hidden from the kernel and leaves the host CPU to do more application
> related work. Saves the cost of licences for application which take
> number of CPU into account (oracle is one such application cited).
One would hope TCP design would not be guided by Oracle licensing quirks.
> 2) On low end (1-2 CPU) x86 based machines, cost of adding a processor
> is much higher than adding a TOE (I personally haven't verified this).
It's not obvious why this should necessarily be the case, given that
it is likely that there will continue to be quite a bit more
general-purpose CPUs made than TCP offload engines.
> 3) For the up and coming 10Gb NICs, TOE will help saturate the link. Some
> vendors assert that TOE will be required to support 10Gb NICs.
Right now, one can do almost 8Gb/s with a single TCP stream over 10GE
(let's say 5 or 6Gb/s with more common hardware). The limiting factor
is currently host bus speed. There's nothing (except for compression
on the bus side, but it should be clear we're not going there) that an
offload engine can do about host bus speed. By the time host busses
faster than 10Gb/s are commonly available, pretty routine x86 box
should handle 10GE saturation.
> 4) Performance reasons. Just the LSO aspect of TOE (sending large chunks of
> data and letting the TOE split it up in mss size pieces) and ack
> coalescing gives a pretty good boost (our own prototypes indicates that
> this is true). The gains are by optimizing data movement and not by
> offloading protocol processing.
Here I would agree with the unnamed TOE vendor whom you're
paraphrasing (and with Jerry Chu's comments in this thread). Ethernet
frame size (1500 bytes or even 9kB) is very small; we'd be in a better
world if we could specify the MTU in units of time (the number of CRC
bits would have to scale up as well, of course). TCP offload engines
could be a kludge that would help to work around this deficiency in
Note that since even at 10Gb/s one CPU is enough to saturate the link
with 9kB packets, the need for this work-around is not at all
pressing. Given the potential harmful effects (undetected errors on
the host bus, difficulty in patching stale or buggy TCP code, etc.),
one would probably be better served by concentrating on the deployment
of jumbo frames. The investment to support jumbo frames has largely
already been made, so why not extract all we can from it first?
Stanislav Shalunov http://www.internet2.edu/~shalunov/
More information about the end2end-interest