[rbridge] Last Call comment on: http://www.ietf.org/internet-drafts/draft-ietf-trill-prob-01.txt

Silvano Gai sgai at nuovasystems.com
Fri Oct 27 10:33:56 PDT 2006


these are my comments:

Sgai 1> The document assume that in spanning tree there are transient
loops. THIS IS ABSOLUTELY FALSE. Spanning tree never causes a loop, not
even during a transition. The document assumes that, since in ST there
are transient loops, it is OK to have transient loops in TRILL and that
they only need to be mitigated through a TTL.

The TTL solution is OK for unicast traffic, since unicast traffic does
not replicate while in the loop and eventually the TTL will drop it or
the network will converge and the frame will be delivered.

The TTL solution is NOT OK for multicast/broadcast traffic, since this
traffic replicates while in the loop causing a broadcast/multicast
storm.

Due to the fact that switches replicate in HW and have low latency, in a
meshed network, even with a moderate TTL, in few hundreds microseconds,
billion of frames will be part of the storm.

These billions of frames will be queue everywhere, causing hosts to
crash, but especially they will saturate the queue of the switch CPU.
The CPU quickly becomes incapable of dealing with its queue and
incapable of receiving control frames to break the loop. 

Anyhow, ISIS will react in hundreds of milliseconds, while the storm
will reach its peak in hundreds of microseconds.

Customers that have seen a broadcast storm, due to a bogus ST
implementation or a misconfiguration, don't want to see a second one.

A solution based on TTL must therefore have a strong requirement for
dedicated buffers/paths for the control frames to reach the CPU, so that
it is guarantee that control frames will eventually break the loop. 

I don't think it is acceptable to have temporary loop for broadcast
multicast, even if they are mitigated by TTL. An interlock mechanism
similar to ST must be used for multicast/broadcast.

I ask for a strong requirement that says: "TRILL MUST avoid
multicast/broadcast storms"

Sgai 2> ST provides symmetrical forwarding, i.e. the path from A to B is
the reverse of the path from B to A. Is this a requirement for TRILL?

Sgai 3> the terminology used in this draft is not the one used in IEEE
standards. This makes it difficult to understand what certain sentences
really mean. Concepts like autolearning and caches are not IEEE
concepts.

Sgai 4> There is no mention of the applicability of other important IEEE
standards/WG/Study Groups, e.g.
- 802.3ad-2000, Link Aggregation.
- 802.1ah - Provider Backbone Bridges
- 802.1aq - Shortest Path Bridging
- 802.1au - Congestion Notification
- 802.1ad - Provider Bridges 
- 802.1AE - MAC Security
- 802.3ar - Congestion Management Task Force.
- 802.3as - Frame Expansion Task Force.
I think this document needs to clearly state the position of the WG with
respect to these projects.

Sgai 5> I also think there need to be a mention of the applicability of
important industrial efforts:
- NIC Teaming
- uplinkfast
- split-MLT
- Q in Q
All these are widely deployed in all datacenters/enterprises. I think
this document needs to clearly state the position of the WG with respect
to these de fact standards.

Sgai 6> Many customers look at TRILL as a backbone network. They would
like to connect their current switches to the TRILL backbone using
Etherchannel  and connecting the member links on different RBridges for
High availability. Is this a requirement? In general which is the
relation between Etherchannel and TRILL?

Sgai 7> Does TRILL work properly if Ethernet is deployed with Pause
enabled?

Additional comments in the text marked as sgai N> where N is the number
of the comment.

For all these reasons, but in particular for <sgai 1> I think this
document needs another major revision before it can complete the WG last
call.

-- Silvano

----------------------------------------------------------

2.5. Problems Not Addressed 

   There are other challenges to deploying Ethernet subnets that are not

   addressed in this document. These include: 

   o  increased Ethernet link subnet scale 

   o  increased node relocation 

   o  Ethernet link subnet management protocol security 

   o  flooding attacks on a Ethernet link subnet 

   Solutions to TRILL are not intended to support deployment of 
   increasingly larger scales of Ethernet link subnets than current 
   broadcast domains can support (e.g., around 1,000 end-hosts in a 
   single bridged LAN of 100 bridges, or 100,000 end-hosts inside 1,000 
   VLANs served by 10,000 bridges). 

Sgai 8> I don't know were these number come from, but with 256/512 ports
Ethernet switches available, it does not take 10,000 bridges to reach
100,000 nodes. I also don't understand if the mention of 1,000 VLANs is
intended as a limit. As I mentioned in previous emails, many customers
don't have enough of 4,000 VLANs and deploy private VLANs. All the
implementations I know about hash the pair (MAD-address, VLAN} into the
filtering database and the only limitation is the size of the filtering
database.

   Similarly, solutions to TRILL are not intended to address link layer 
   node migration, which can complicate the caches in learning bridges. 

Sgai 9> IEEE 802.1D does not contain the word "cache". Are you referring
to the filtering database? Why are filtering databases complicated by
node migration? I think that TRILL should provide a solution to node
migration that is as good as IEEE 802.1D or better.

   Similar challenges exist in the ARP protocol, where link layer 
   forwarding is not updated appropriately when nodes move to ports on 
   other bridges. Again, the compartmentalization available in network 
   routing, like that of network layer ASes, can help hide the effect of

   migration. That is a side effect, however, and not a primary focus of

   this work. 

Sgai 10> I am not sure what the previous sentence means, I will remove
it.

   Current link control plane protocols, including Ethernet link subnet 
   management (STP) and link/network integration (ARP), are vulnerable 
   to a variety of attacks. Solutions to TRILL are not intended to 
   directly address these vulnerabilities. Similar attacks exist in the 
   data plane, e.g., source address spoofing, single address traffic 
   attacks, traffic snooping, and broadcast flooding. TRILL solutions do

   not address any of these issues, although it is critical that they do

   not introduce new vulnerabilities in the process (see Section 5). 

3. Desired Properties of Solutions to TRILL 

   This section describes some of the desirable or required properties 
   of any system that would solve the TRILL problems, independent of the

   details of such an architecture. Most of these are based on retaining

   useful properties of bridges, or maintaining those properties while 
   solving the problems listed in Section 2. 

3.1. No Change to Link Capabilities 

   There must be no change to the service that Ethernet subnets already 
   provide as a result of deploying a TRILL solution. Ethernet supports 
   unicast, broadcast, and multicast natively. Although network 
   protocols, notably IP, can tolerate link layers that do not provide 
   all three, it would be useful to retain the support already in place 
   [7]. 

Sgai 11> This requirement needs to be a "must". It also needs to say
that TRILL need to work also for non-IP protocols,


Zeroconf, as well as existing bridge autoconfiguration, are 
   dependent on broadcast as well. 

   Current Ethernet ensures in-order delivery and no duplicated packets 
   under normal operation (excepting transients during reconfiguration).


Sgai 12> outside a marginal corner case in RSTP that affects only
in-order delivery, these two properties are also guarantee during
reconfiguration. There are no transient loops in ST, see <sgai 1>

   These criteria apply in varying degrees to the different variants of 
   Ethernet, e.g., basic Ethernet up through basic VLAN (802.1Q) ensures
 
Sgai 13> IEEE 802.1Q is not involved in this; it is a property of IEEE
802.1D.

   that all packets between two link addresses have both properties, but

   protocol/port VLAN (802.1V) ensures this only for packets with the 
   same protocol and port. [JUST CHECKING - OR AM I MISREADING WHAT 
   802.1V DOES?] 
Sgai 14> this needs to be resolved
 
 
Touch & Perlman         Expires April 22, 2007                 [Page 8] 

Internet-Draft     TRILL: Problem and Applicability        October 2006 
    

   There are subtle implications to such a requirement. Bridge 
   autolearning 

sgai 15> autolearning is not a well known concept, not present in IEEE
802.1D

already is susceptible to moving nodes between ports, 
   because previously learned associations between port and link address

   change. A TRILL solution could be similarly susceptible to such 
   changes. 

3.2. Zero Configuration and Zero Assumption 

   Both bridges and hubs are zero configuration devices; hubs having no 
   configuration at all, and bridges being automatically self-
   configured. Bridges are further zero-assumption devices, unlike hubs.

   Bridges can be interconnected in arbitrary topologies, without regard

   for cycles or even self-attachment. STP removes the impact of cycles 
   automatically, and port autolearning reduces unnecessary broadcast of

   unicast traffic. 

Sgai 16> port autolearning is not an IEEE concept. 

   A TRILL solution should strive to have similar zero configuration, 
   zero assumption operation. This includes having TRILL solution 
   components automatically discover other TRILL solution components and

   organize themselves, as well as to configure that organization for 
   proper operation (plug-and-play). It also includes zero configuration

   backward compatibility with existing bridges and hubs, which may 
   include interacting with some of the bridge protocols, such as STP. 

   VLANs add a caveat to zero configuration; a TRILL solution should 
   support automatic use of a default VLAN (like non-VLAN bridges), but 
   should require explicit configuration where the VLANS require them as

   well. 

Sgai 17> The discussion about VLAN needs to be much more extensive. It
is clear from the mailing list discussion that VLANs can be used inside
the packet or in the Ethernet encapsulation of TRILL. These are two
different kinds of VLANs and their requirement need to be stated
separately. Q in Q needs also to be discussed. See also <sgai 26>.

   Autoconfiguration extends to optional services, such as multicast 
   support via IGMP snooping, broadcast support via serial copy, and 
   supporting multiple VLANs.  

Sgai 18> what about VLAN pruning?

3.3. Forwarding Loop Mitigation 

   Spanning tree avoids forwarding loops by construction, although 
   transient loops can occur, e.g., via the appearance of a new link. 

Sgai 19> this statement is incorrect. ST does not have transient loops.
See <sgai 1>
  
   Solutions to TRILL are intended to use adapted network layer routing 
   protocols which may introduce transient loops during routing 
   convergence. TRILL solutions thus need support for mitigating the 
   effect of such routing loops. 

   In the Internet, loop mitigation is provided by a decrementing 
   hopcounts (TTL); in other networks, packets include a trace 
   (serialized or unioned) of visited nodes [1]. These mechanisms 
   (respectively) limit the impact of loops or detect them explicitly. A

   mechanism with similar effect should be included in TRILL solutions. 

Sgai 20> see <sgai 1>




 
 
Touch & Perlman         Expires April 22, 2007                 [Page 9] 

Internet-Draft     TRILL: Problem and Applicability        October 2006 
    

   [QUESTION: anyone have a good reference for serialized or union 
   traces - or better names for them?] 

sgai 21> this needs to be resolved

3.4. Spanning Tree Management 

   In order to address convergence under reconfiguration and robustness 
   to link interruption (Sections 2.2 and 2.3), participation in the STP

   must be carefully managed. The goal is to provide the desired 
   stability of the TRILL solution and of the entire Ethernet link 
   subnet while not interfering with the operation of STP of the 
   Ethernet on which the TRILL resides. This may involve TRILL solutions

   participating in the STP, where the protocol is used for TRILL might 
   dampen interactions with STP, or it may involve severing the STP into

   separate STPs on 'stub' external Ethernet link subnet segments. 

   A requirement is that a TRILL solution must not require modifications

   or exceptions to the existing spanning tree protocols (STP, MSTP).
 
Sgai 22> does this include RSTP? More in general this document does not
describe requirements for the interaction of TRILL with ST. 

   [we need pictures here; to appear] 

Sgai 23> this needs to be resolved

3.5. Multiple Attachments 

   In STP, a single NIC with multiple attachments to a single spanning 
   tree will always only get traffic over one of the two attachment 
   points, 

sgai 24> Not clear how a NIC in the host can have multiple attachments.
If you are referring to NIC teaming, what you says is false.

TRILL allows load sharing between the attachment points. 
   Further, TRILL must manage multicast and broadcast traffic so as not 
   to create feedback loops on Ethernet segments which are attached at 
   multiple TRILL access points. 

   [NOTE: this might be omitted, as it has not been shown to be a 
   problem with STP]. 

Sgai 25> this needs to be resolved

3.6. VLAN Issues 

   A TRILL solution should support multiple VLANs (802.1Q, 802.1V, and 
   802.1S). This may involve ignorance, just as many bridge devices do 
   not participate in the VLAN protocols. It may alternately support 
   direct VLAN support, e.g., by the use of separate TRILL routing 
   protocol instances to separate traffic for each VLAN traversing a 
   TRILL solution. 

Sgai 26> See also <sgai 17>. I am not sure what the first two sentences
are trying to say, the last part needs to be expanded and clearly
differentiated from the discussion related to the section 3.2. I propose
to call these VLANs the "outer VLANs" and the VLANs discussed in 3.2 the
"inner VLANs" (with reference to the position of the tag in the frame.


3.7. Equivalence 

   As with any extension to an existing architecture, it would be useful

   - though not strictly necessary - to be able to describe or consider 
   a TRILL solution as a model of an existing link layer component. Such

   equivalence provides a validation model for the architecture, and a 
 
 
Touch & Perlman         Expires April 22, 2007                [Page 10] 

Internet-Draft     TRILL: Problem and Applicability        October 2006 
    

   way for users to predict the effect of the use of a TRILL solution on

   a deployed Ethernet. In this case, 'user' refers to users of the 
   Ethernet protocol, whether at the host (data segments), bridge (ST 
   control segments), or VLAN (VLAN control). 

   This provides a sanity check, i.e., "we got it right if we can 
   replace a TRILL solution with an X" (where "X" might be a single 
   bridge, a hub, or some other link layer abstraction). It does not 
   matter whether "X" can be implemented on the same scale as the 
   corresponding TRILL solution. It also does not matter if it can - 
   there may be utility to deploying the TRILL solution components 
   incrementally, in ways that a single "X" could not be installed. 

   For example, if TRILL solution were equivalent to a single 802.1D 
   bridge, it would mean that the TRILL solution would - as a whole - 
   participate in the STP. This need not require that TRILL solution 
   would propagate STP, any more than a bridge need do so in its on-
   board control. It would mean that the solution would interact with 
   BPDUs at the edge, where the solution would - again, as a whole - 
   participate as if a single node in the spanning tree. Note that this 
   equivalence is not required; a solution may act as if an 802.1 hub, 
   or may not have a corresponding equivalent link layer component at 
   all. 

3.8. Optimizations 

   There are a number of optimizations that may be applied to TRILL 
   solutions. These must be applied in a way that does not affect 
   functionality as a tradeoff for increased performance. Such 
   optimizations address broadcast and multicast frame distribution, 
   VLAN support, and snooping of ARP and IPv6 neighbor discovery. 

   [NOTE: need to say more here.] 

Sgai 27> this needs to be resolved

3.9. Internet Architecture Issues 

   TRILL solutions are intended to have no impact on the Internet 
   network layer architecture. In particular, the Internet and higher 
   layer headers should remain intact when traversing a TRILL solution, 
   just as they do when traversing any other link subnet technologies. 
   This means that the IP TTL field cannot be co-opted for forwarding 
   loop mitigation, as it would interfere with the Internet layer 
   assuming that the link subnet was reachable with no changes in TTL 
   (Internet TTLs are changed only at routers, as per RFC 1812, and even

   if IP TTL were considered, TRILL is expected to support non-IP 
   payloads, and so requires a separate solution anyway) [1]. 

 Sgai 28> The requirement must be: "TRILL must support non-IP 
   Payloads"
 
Touch & Perlman         Expires April 22, 2007                [Page 11] 

Internet-Draft     TRILL: Problem and Applicability        October 2006 
    

   TRILL solutions should also have no impact on Internet routing or 
   signaling, which also means that broadcast and multicast, both of 
   which can pervade an entire Ethernet link subnet, must be able to 
   transparently pervade a TRILL solution. Changing how either of these 
   capabilities behaves would have significant effects on a variety of 
   protocols, including RIP (broadcast), RIPv2 (multicast), ARP 
   (broadcast), IPv6 neighbor discovery (multicast), etc. 

   Note that snooping of network layer packets may be useful, especially

   for certain optimizations. These include snooping multicast control 
   plane packets (IGMP) to tune link multicast to match the network 
   multicast topology, as is already done in existing smart switches 
   [2]. This also includes snooping IPv6 neighbor discovery messages to 
   assist with governing TRILL solution edge configuration, as is the 
   case in some smart learning bridges [9]. Other layers may similarly 
   be snooped, notably ARP packets, for similar reasons for IPv4 [13]. 

   [Need a ref for the router-router 'igmp' protocol] 

Sgai 29> this needs to be resolved

4. Applicability 

   As might be expected, TRILL solutions are intended to be used to 
   solve the problems described in Section 2. However, not all such 
   installations are appropriate environments for such solutions. This 
   section outlines the issues in the appropriate use of these 
   solutions. 

   TRILL solutions are intended to address problems of path efficiency 
   and stability within a single Ethernet link subnet. Like bridges, 
   individual TRILL solution components may find other TRILL solution 
   components within a single Ethernet link subnet and aggregate into a 
   single TRILL solution.  

   TRILL solutions are not intended to span separate Ethernet link 
   subnets where interconnected by network layer (e.g., router) devices,

   except via link layer tunnels that are in place prior to their 
   deployment, where such tunnels render the distinct subnet 
   undetectably equivalent from a single Ethernet link subnet. 

   A currently open question is whether a single Ethernet link subnet 
   should contain only one TRILL solution instance, either of necessity 
   of architecture or utility. 

Sgai 30> this needs to be resolved

Multiple TRILL solutions, like Internet 
   ASes, may allow TRILL routing protocols to be partitioned in ways 
   that help their stability, but this may come at the price of needing 
   the TRILL solutions to participate more fully as nodes (each modeling

   a bridge) in the Ethernet link subnet STP. Each architecture solution

   should decide whether multiple TRILL solutions are supported within a

 
 
Touch & Perlman         Expires April 22, 2007                [Page 12] 

Internet-Draft     TRILL: Problem and Applicability        October 2006 
    

   single Ethernet link subnet and mechanisms should be included to 
   enforce whatever decision is made. 

   TRILL solutions are not intended to address scalability limitations 
   in bridged subnets. Although there may be scale benefits of other 
   aspects of solving TRILL problems, e.g., of using network layer 
   routing to provide stability under link changes or intermittent 
   outages, this is not a focus of this work. 

   As also noted earlier, TRILL solutions are not intended to address 
   security vulnerabilities in either the data plane or control plane of

   the link layer. This means that TRILL solutions should not limit 
   broadcast frames, ARP requests, or spanning tree protocol messages 
   (if such are interpreted by the TRILL solution or solution edge). 

5. Security Considerations 

   TRILL solutions should not introduce new vulnerabilities compared to 
   traditional bridged subnets.  

   TRILL solutions are not intended to be a solution to Ethernet link 
   subnet vulnerabilities, including spoofing, flooding, snooping, and 
   attacks on the link control plane (STP, flooding the learning cache) 
   and link-network control plane (ARP). Although TRILL solutions are 
   intended to provide more stable routing than STP, this stability is 
   limited to performance, and the subsequent robustness is intended to 
   address non-malicious events. 

   There may be some side-effects to the use of TRILL solutions that can

   provide more robust operation under certain attacks, such as those 
   interrupting or adding link service, but TRILL solutions should not 
   be relied upon for such capabilities. 

   Finally, TRILL solutions should not interfere with other protocols 
   intended to address these vulnerabilities, such as those under 
   development to secure IPv6 neighbor discovery.  

   [need a ref for secure ipv6 nd] 

Sgai 31> this needs to be resolved

6. IANA Considerations 

   This document has no IANA considerations.  

   This section should be removed by the RFC Editor prior to final 
   publication. 


 
 
Touch & Perlman         Expires April 22, 2007                [Page 13] 

Internet-Draft     TRILL: Problem and Applicability        October 2006 
    

7. Conclusions 

   (TBA) 

Sgai 32> this needs to be resolved




More information about the rbridge mailing list