[rbridge] Comments on: http://www.ietf.org/internet-drafts/draft-ietf-trill-rbridge-arch-01.txt
Silvano Gai
sgai at nuovasystems.com
Mon Oct 30 09:12:49 PST 2006
Comments inline marked "sgai n>"
Sgai 1> This documents assumes that all multicast traffic must be
propagated through the IRT (Ingress Rbridge Tree) and therefore denies
any possibility for shared trees.
-- Silvano
--------------------------------------------------------
... snip ...
1. Introduction
This document describes an architecture that addresses the TRILL
problem and applicability statement [2]. This architecture is
composed of a set of devices called RBridges that cooperate
together within an Ethernet network to provide a layer two
delivery service that makes efficient use of available links
using a link state routing protocol. The service provided is
analogous to creation of a single, virtual device composed of an
overlay of tunnels, constructed between RBridge devices, using
link state routing. RBridges thus support increased RBridge to
RBridge bandwidth and fault tolerance, when compared to
conventional Ethernet bridges (which forward frames via a
spanning tree), while still being compatible with bridges and
hubs.
The principal objectives of this architecture is to provide an
overview of the use of these RBridges in meeting the following
goals:
1) Provide a form of optimized layer two delivery service.
2) Use existing technology as much as possible.
3) Allow for configuration free deployment.
In providing a (optimized) layer two (L2) service, key factors
we want to maintain are: transparency to higher layer (layer 3
and above) delivery services and mechanisms, and use of location
independent addressing. Optimization of the L2 delivery service
consists of: use of an optimized subset of all available paths
and support for pruning of multicast traffic delivery paths.
Sgai 2> I think we should explicitly mention layer 2 multipath.
To accomplish the goal of using existing technologies as much as
possible, we intend to specify minimal extensions (if required)
to one or more existing link-state routing protocols, as well as
defining the specific sub-set of existing bridging technologies
this architecture makes use of.
The extent to which routing protocol extensions may be required
depends on the closeness of the "fit" of any chosen routing
protocol to RBridge protocol requirements. See [6] for further
information on these requirements. The use of a specific routing
protocol - along with appropriate extensions and enhancements -
will be defined in corresponding RBridge protocol specifications
(see [3] for example).
Gray Expires April, 2007 [Page 4]
Internet-Draft RBridge Architecture October 2006
Specific protocol specifications will also describe the details
of interactions between the RBridge protocol and specific L2
technologies - i.e. - Virtual Local Area Networking (VLAN), L2
Multicast, etc.
As an overview, however, the intention is to use a link-state
routing protocol to accomplish the following:
1) Discover RBridge peers.
2) Determine RBridge link topology.
3) Advertise L2 reachability information.
4) Establish L2 delivery using shortest path (verses STP).
Sgai 3> It is very important to say that this protocol must provide a
loop free topology for multicast/broadcast under any condition,
including transient loops, to prevent multicast/broadcast storm.
There are additional RBridge protocol requirements - above and
beyond those addressed by any existing routing protocol - that
are identified in this document and need to be addressed in
corresponding RBridge protocol specifications.
To allow for configuration free deployment, specific protocol
specifications need to explicitly define the conditions under
which RBridges may - and may not - be deployed as-is (plug and
play), and the mechanisms that are required to allow this. For
example, the first requirement any RBridge protocol must meet is
to derive information required by link-state routing protocol(s)
for protocol start-up and communications between peers - such as
higher-layer addressing and/or identifiers, encapsulation header
information, etc.
At the abstract level, RBridges need to maintain the following
information:
1) Peer information,
2) Topology information,
3) Forwarding information -
a. unicast,
b. flooded, and
c. multicast.
Gray Expires April, 2007 [Page 5]
Internet-Draft RBridge Architecture October 2006
Peer information may be acquired via the routing protocol, or
may be discovered as a result of RBridge-specific peer discovery
mechanisms. Topology information is expected to be acquired via
the link-state routing protocol.
Forwarding information is derived from the combination of
attached MAC address learning, snooping of multicast-related
protocols (e.g. - IGMP), and routing advertisements and path
computations using the link-state routing protocol.
Sgai 4> an RBridge must be also able to send two copies of a
unicast/multicast/broadcast packet on the same port when it acts as a
designated RBridge (one copy is encapsulated, the other not).
The remainder of this document outlines the TRILL architecture
of an RBridge-based solution and describes RBridge components,
interactions and functions. Note that this document is not
intended to represent the only solution to the TRILL problem
statement, nor does it specify the protocols that instantiate
this architecture - or that only one such set of protocols is
prescribed. The former may be contained in other architecture
documents and the latter would be contained in separate
specification documents (see - e.g. - [3]).
2. Background
This architecture is based on the RBridge system described in an
Infocom paper [1]. That paper describes the RBridge system as a
specific instance; this document abstracts architectural
features only. The remainder of this section describes the
terminology of this document, which may differ from that of the
original paper.
2.1. Existing Terminology
The following terminology is defined in other documents. A brief
definition is included in this section for convenience and - in
some cases - to remove any ambiguity in how the term may be used
in this document, as well as derivative documents intended to
specify components, protocol, behavior and encapsulation
relative to the architecture specified in this document.
o 802: IEEE Specification for the Ethernet architecture, i.e.,
including hubs and bridges.
Sgai 5> here there is some confusion between 802 and 802.3
o 802.1D: IEEE Specification for bridged Ethernet, including
the BPDUs used in spanning tree protocol (STP) [5].
Gray Expires April, 2007 [Page 6]
Internet-Draft RBridge Architecture October 2006
o ARP: Address Resolution Protocol - a protocol used to find an
address of form X, given a corresponding address of form Y.
In this document, ARP refers to the well-known protocol used
to resolve L2 (MAC) addresses, using a given L3 (IP) address.
See [10] for further information on IP ARP.
o Bridge: an Ethernet (L2, 802.1D) device with multiple ports
that receives incoming frames on a port and transmits them on
zero or more of the other ports; bridges support both bridge
learning and STP. Transparent bridges do not modify the L2
PDU being forwarded.
o Bridge Learning: process by which a bridge determines on
which single outgoing port to transmit (forward or copy) an
incoming unicast frame. This process depends on consistent
forwarding as "learning" uses the source MAC address of
frames received on each interface. Layer 2 (L2) forwarding
devices "learn" the location of L2 destinations by peeking at
layer 2 source addresses during frame forwarding, and store
the association of source address and receiving interface.
L2 forwarding devices use this information to create
"filtering database" entries and - gradually - eliminate the
need for flooding.
o Bridge Protocol Data Unit (BPDU): the frame type associated
with bridge control functions (for example: STP/RSTP).
o Bridge Spanning Tree (BST): an Ethernet (L2, 802.1D)
forwarding protocol based on the topology of a spanning tree.
Sgai 6> I have never seen the acronym BST, everyone use STP.
o Broadcast Domain: the set of (layer 2) devices that must be
reached (or reachable) by (layer 2) broadcast traffic
injected into the domain.
o Broadcast Traffic: traffic intended for receipt by all
devices in a broadcast domain.
o Ethernet: See "802" above.
Sgai 7> for Ethernet is better to reference 802.3
o Filtering Database - database containing association
information of (source layer 2 address, arrival interface).
The interface that is associated with a specific layer 2
source address, is the same interface which is used to
forward frames having that address as a destination. When a
layer 2 forwarding device has no entry for the destination
layer 2 address of any frame it receives, the frame is
"flooded".
Gray Expires April, 2007 [Page 7]
Internet-Draft RBridge Architecture October 2006
o Flooded Traffic - traffic forwarded on all interfaces, except
those on which it was received, within the same broadcast
domain. Flooding is the mechanism by which traffic is
delivered to a destination that is currently "unknown" (i.e.
- either not yet "learned", or aged out of the "filtering
database").
o Flooding - the process of forwarding traffic to ensure that
frames reach all possible destinations when the destination
location is not known. In "flooding", an 802.1D forwarding
device forwards a frame for any destination not "known" (i.e.
- not in the filtering database) on every active interface
except that one on which it was received. See also VLAN
flooding.
o Frame: in this document, frame refers to an Ethernet (L2)
unit of transmission (PDU), including header, data, and
trailer (or payload and envelope).
o Hub: an Ethernet (L2, 802) device with multiple ports which
sgai 8> for Hub is better to reference 802.3
transparently transmits frames arriving on any port to all
other ports. This is a functional definition, as there are
devices that combine this function with certain bridge-like
functions that may - under certain conditions - be referred
to as "hubs".
o IGP: Interior Gateway Protocol - any of the potential (link-
state) routing protocols candidates considered as potentially
useful RBridge routing protocols.
o IS-IS: Intermediate System to Intermediate System routing
protocol. See [8] for further information on IS-IS.
o LAN: Local Area Network. A LAN is an L2 forwarding domain.
This term is synonymous with Ethernet Subnet in the context
of this document.
o MAC: Media Access Control - mechanisms and addressing for L2
frame forwarding.
o Multicast Forwarding: forwarding methods that apply to frames
with broadcast or multicast destination MAC addresses.
o Node: a device with an L2 (MAC) address that sources and/or
sinks L2 frames.
Sgai 9> The IEEE term is "station".
Gray Expires April, 2007 [Page 8]
Internet-Draft RBridge Architecture October 2006
o OSPF: Open Shortest Path First routing protocol. See [7] and
[9] for further information on OSPF.
o Packet: in this document, packet refers to L3 (or above) data
transmission units (PDU - e.g. - an IP Packet (RFC791 [4]),
including header and data.
o PDU: Protocol Data Unit - unit of data to be transmitted by a
protocol. To distinguish L2 and L3 PDUs, we refer to L2 PDUs
as "frames" and L3 PDUs as "packets" in this (and related)
document(s).
o Router: a device that performs IP (L3) forwarding (the
"routing function"); RBridges typically do not span routers
(i.e. - provide a connection from one router interface to
another router interface on the same router).
o Routing Function: in this document, the "routing function"
consists of forwarding IP packets between L2 broadcast
domains, based on L3 addressing and forwarding information.
In the process of performing the "routing function", devices
(typically routers) usually forward packets from one L2
broadcast domain to another (one, or more in the IP multicast
case) - distinct - L2 broadcast domain(s). RBridges cannot
span the routing function.
o Segment: an Ethernet link, either a single physical link or
emulation thereof (e.g., via hubs) or a logical link or
emulation thereof (e.g., via bridges).
Sgai 10> IEEE uses the term "LAN segment"
o Spanning Tree Protocol (STP): an Ethernet (802.1D) protocol
for establishing and maintaining a single spanning tree among
all the bridges on a local Ethernet segment. Also, Rapid
Spanning Tree Protocol (RSTP). In this document, STP and RSTP
are considered to be the same.
o Spanning Tree Table (STT): a table containing port activation
status information as determined during STP.
o SPF: Shortest Path First - an algorithm name associated with
routing, used to determine a shortest path graph traversal.
Gray Expires April, 2007 [Page 9]
Internet-Draft RBridge Architecture October 2006
o Subnet, Ethernet: a single segment, or a set of segments
interconnected by a CRED (see section 2.2); in the latter
sgai 11> There is no concept of subnet in IEEE.
Subnet is typically an IP subnet, and, even if it is common to have one
subnet per LAN, this is not the only possibility. Pragmatically IP
subnets and Ethernet LAN are unrelated concepts.
case, the subnet may or may not be equivalent to a single
segment. Also a subnet may be referred to as a broadcast
domain or LAN. By definition, all nodes within an Ethernet
Subnet (broadcast domain or LAN) must have L2 connectivity
with all other nodes in the same Ethernet subnet.
o TRILL: Transparent Interconnect over Lots of Links - the
working group and working name for the problem domain to be
addressed in this document.
o Unicast Forwarding: forwarding methods that apply to frames
with unicast destination MAC addresses.
o Unknown Destination - a destination for which a receiving
device has no filtering database entry. Destination (layer
sgai 12> the stations receive the unknown unicast and have filtering
information, only the bridges don't. I propose to replace device with
bridge.
2) addresses are typically "learned" by (layer 2) forwarding
devices via a process commonly referred to as "bridge
learning".
Sgai 13> in IEEE, the term used is "learning" instead of "bridge
learning"
o VLAN: Virtual Local Area Network. VLANs in general fall into
two categories: link (or port) specific VLANs and tagged
VLANs. In the former case, all frames forwarded and all
directly connected nodes are assumed to be part of a single
VLAN. In the latter case, VLAN tagged frames are used to
distinguish which VLAN each frame is intended for.
Sgai 14> This definition is not completely correct, I prefer:
VLAN technology introduces the following three basic types of frame:
a) Untagged frames;
b) Priority-tagged frames; and
c) VLAN-tagged frames.
An untagged frame or a priority-tagged frame does not carry any
identification of the VLAN to which it belongs. Such frames are
classified as belonging to a particular VLAN based on parameters
associated with
the receiving Port, or, through proprietary extensions to IEEE 802.1Q
standard, based on the data content of the frame (e.g., MAC Address,
layer 3 protocol ID, etc.).
A VLAN-tagged frame carries an explicit identification of the VLAN to
which it belongs; i.e., it carries a tag header that carries a non-null
VID. Such a frame is classified as belonging to a particular VLAN based
on the value of the VID that is included in the tag header. The presence
of the tag header carrying a non-null VID means that some other device,
either the originator of the frame or a VLAN-aware Bridge, has mapped
this frame into a VLAN and has inserted the appropriate VID.
o VLAN Flooding: flooding as described previously, except that
frames are only forwarded on those interfaces configured for
participation in the applicable VLAN.
2.2. RBridge Terminology
The following terms are defined in this document and intended
for use in derivative documents intended to specify components,
protocol, behavior and encapsulation relative to the
architecture specified in this document.
o CRED: Cooperating RBridges and Encapsulation Tunnels - a
topological construct consisting of a set of cooperating
RBridges, and the forwarding tunnels connecting them.
Gray Expires April, 2007 [Page 10]
Internet-Draft RBridge Architecture October 2006
o CRED Forwarding Table (CFT): the per-hop forwarding table
populated by the RBridge Routing Protocol; forwarding within
the CRED is based on a lookup of the CRED Transit Header
(CTH) encapsulated within the outermost received L2 header.
The outermost L2 encapsulation in this case includes the
source MAC address of the immediate upstream RBridge
transmitting the frame and destination MAC address of the
receiving RBridge for use in the unicast forwarding case.
Sgai 15> In section 7 of
http://www.ietf.org/internet-drafts/draft-gai-perlman-trill-encap-00.txt
we proposed that when two rbridges are connected by a point to point
link the outer MAC addresses may be set to a predefined value in
transmission and ignored in reception.
o CFT-IRT: a forwarding table used for propagation of
broadcast, multicast or flooded frames along the Ingress
RBridge Tree (IRT).
Sgai 16> is it a forwarding table or is it a filtering database. Since
the "miss" behavior is to flood, I see it as a filtering database.
o CRED Transit Header (CTH): a 'shim' header that encapsulates
the ingress L2 frame and persists throughout the transit of a
CRED, which is further encapsulated within a hop-by-hop L2
header (and trailer). The hop-by-hop L2 encapsulation in this
case includes the source MAC address of the immediate
upstream RBridge transmitting the frame and destination MAC
address of the receiving RBridge - at least in the unicast
forwarding case.
Sgai 17> is this true also for unknown unicast?
o CRED Transit Table (CTT): a table that maps ingress frame L2
destinations to egress RBridge addresses, used to determine
encapsulation of ingress frames for transit of the CRED.
o Cooperating RBridges - those RBridges within a single
Ethernet Subnet (broadcast domain or LAN) not having been
configured to ignore each other. By default, all RBridges
within a single Ethernet subnet will cooperate with each
other. It is possible for implementations to allow for
configuration that will restrict "cooperation" between an
RBridge and an apparent neighboring RBridge. One reason why
this might occur is if the trust model that applies in a
particular deployment imposes a need for configuration of
security information. By default no such configuration is
required however - should it be used in any specific scenario
- it is possible (either deliberately or inadvertently) to
configure neighboring RBridges so that they do not cooperate.
In the remainder of this document, all RBridges are assumed
to be in a cooperating (default) configuration.
Sgai 18> can RBridges cooperate in groups, e.g. four Rbridges connected
to a LAN cooperating two and two?
Gray Expires April, 2007 [Page 11]
Internet-Draft RBridge Architecture October 2006
o Designated RBridge (DR): the RBridge associated with ingress
and egress traffic to a particular Ethernet link having
shared access among multiple RBridges; that RBridge is such a
link's "Designated RBridge". The Designated RBridge is
determined by an election process among those RBridges having
shared access via a single Segment.
o Edge RBridge (edge of a CRED): describes RBridges that serve
to ingress frames into the CRED and egress frames from the
CRED. L2 frames transiting an RBridge CRED enter, and leave,
it via an edge RBridge.
o Egress RBridge: for any specific frame, the RBridge through
which that frame leaves the CRED. For frames transiting a
CRED, the egress RBridge is an edge RBridge where RBridge
encapsulation is removed from the transit frames prior to
exiting the CRED.
o Forwarding Tunnels: in this document, CRED Forwarding Tunnels
(or Forwarding Tunnels) is used to refer to the paths for
forwarding transit frames, encapsulated at an RBridge ingress
and decapsulated at an RBridge egress.
o Ingress RBridge: for any specific frame, the RBridge through
which that frame enters the CRED. For frames transiting a
CRED, the ingress RBridge is the edge RBridge where RBridge
encapsulation is added to the transit traffic entering the
CRED.
o Ingress RBridge Tree: a tree computed for each edge RBridge -
and potentially for each VLAN in which that RBridge
sgai 19> why for each VLAN? I got the impression that there was a single
tree that was pruned differently for different VLANs.
participates - for delivery of broadcast, multicast and
flooded frames from that RBridge to all relevant egress
RBridges. This is the point-to-multipoint delivery tree used
by an ingress RBridge to deliver multicast, broadcast or
flooded traffic.
Sgai 20> the current version of the proposal speaks about a multicast
address, not point-to-point.
The tree consists of a set of one or more
next-hops to be used when the ingress RBridge receives a
multicast or broadcast frame (frame with a multicast or
broadcast destination address), or frame with unknown
destination addresses. If forwarding frames hop-by-hop, next
hop RBridges will, in turn, have a similar set of one or more
next-hops to be used for forwarding these frames - when
received from an upstream, or ingress, RBridge. This
progression continues until frames arrive at egress RBridges.
o LPT: Learned Port Table. See Filtering Database.
Sgai 21> not proper terminology, I would use "filtering database"
everywhere.
Gray Expires April, 2007 [Page 12]
Internet-Draft RBridge Architecture October 2006
o RBridge: a logical device as specified in this document,
which incorporate both routing and bridging features, thus
allowing for the achievement of TRILL Architecture goals. A
single RBridge device which can aggregate with other RBridge
devices to create a CRED.
3. Components
A CRED is composed of RBridge devices and the forwarding tunnels
that connect them; all other Ethernet link subnet devices, such
as bridges, hubs, and nodes, operate conventionally in the
presence of an RBridge.
3.1. RBridge Device
An RBridge is a bridge-like device that forwards frames on an
Ethernet link segment. It has one or more Ethernet ports which
may be wired or wireless;
sgai 22> I wired port is Ehernet, i.e. IEEE 802.3, a wireless port is
not Ethernet, it is IEEE 802.11.
the particular physical layer is not
relevant. An RBridge is defined more by its behavior than its
structure, although it contains three tables which distinguish
it from conventional bridges.
Conventional bridges contain a learned port table (LPT), or
filtering database, and a spanning tree table (STT). The LPT
allows a bridge to avoid flooding all received frames, as is
typical for a hub or repeater. The bridge learns which nodes are
accessible from a particular port by assuming bi-directional
consistency:
sgai 23> they learn because STP guarantees symmetrical forwarding
the source addresses of incoming frames indicate
that the incoming port is to be used as output for frames
destined to that address. Incoming frames are checked against
the LPT and forwarded to the particular port if a match occurs,
otherwise they are flooded out all active ports (except the
incoming port).
Sgai 24> active ports -> forwarding ports
The STT indicates the ports used in the spanning tree.
Sgai 25> there is no STT, there is a state associated with each port
that can be: disabled, blocking, listening, learning, and forwarding
Details
of STP operation are out of scope for this document, however the
result of STP is to disable ports
sgai 26> disabled -> blocking
which would otherwise result
in more than one path traversal of the spanning tree.
RBridges, by comparison, have a CRED Forwarding Table (CFT -
used for unicast forwarding of RBridge encapsulated frames
across the CRED), CFT-IRT (used for flooding, broadcast or
multicast forwarding of RBridge encapsulated frames across the
CRED) and a CRED Transit Table (CTT - used by the ingress
RBridge to determine what encapsulation to use for frames
received as un-encapsulated from non-RBridge devices), described
in the following sections.
Gray Expires April, 2007 [Page 13]
Internet-Draft RBridge Architecture October 2006
3.2. RBridge Data Model
The following tables represent the logical model of the data
required by RBridges in forwarding unicast and multicast data
across a CRED.
3.2.1. CFT
The CFT is a forwarding table for unicast traffic within the
CRED, allowing tunneled traffic to transit the CRED from ingress
to egress. The size of a fully populated CFT at each RBridge is
maximally bounded by the product of the number of directly
connected RBridge peers (where "directly connected" in this
context refers to RBridges connected to each other without
transiting one or more additional RBridges) and VLANs. RBridges
may have separate CFTs for each VLAN, if this is supported by
configuration. The CFT is continually maintained by RBridge
routing protocol (see Section 4.7).
sgai 27> I repeat a comment that I have made to other documents: "
The discussion about VLAN needs to be much more extensive. It is clear
from the mailing list discussion that VLANs can be used inside the
packet or in the Ethernet encapsulation of TRILL. These are two
different kinds of VLANs and their requirement need to be stated
separately. Q in Q needs also to be discussed. I propose to define inner
and outer VLANs (with reference to the position of the tag in the
frame."
Here I think we are talking about outer VLANs
The CFT contains data specific to RBridge forwarding for unicast
traffic. The specific fields contained in this table are to be
defined in RBridge protocol specifications. In the abstract,
however, the table should contain forwarding direction and
encapsulation associated with an RBridge encapsulated frame
received - determined by the "shim" header destination and VLAN
(if applicable).
3.2.2. CFT-IRT
The CFT-IRT consists of a set of forwarding entries used for
support of Ingress RBridge Trees (IRT). CFT-IRT entries are
distinct from typical CFT entries because there may be zero or
more of them that match for any incoming frame.
The CFT-IRT may be part of the CFT, or instantiated as a
separate table, in implementations.
In discussing entries to be included in the CFT-IRT, the
following entities are temporarily defined, or further
qualified:
o Ingress RBridge - the RBridge that is the head end of an IRT.
All RBridges within a CRED are potential ingress RBridges.
Sgai 28> IMO all RBridges must be ingress RBridges, at least to support
inband management, e.g. SNMP.
Gray Expires April, 2007 [Page 14]
Internet-Draft RBridge Architecture October 2006
o Egress RBridge - an RBridge that is the tail end of a path
corresponding to a specific CFT-IRT entry. All RBridges
within a CRED are potential egress RBridges.
Sgai 29> same as above
Not all RBridges
within a CRED will be on the shortest path between any
ingress RBridge and any other egress RBridge.
o Local RBridge - the RBridge that forms and maintains the CFT-
IRT entry (or entries) under discussion. The local RBridge
may be an Ingress RBridge, or an egress RBridge with respect
to any set of entries in the CFT-IRT.
Sgai 30> I think the previous definition is not needed.
o RBridge CRED Egress Interface - an interface on any RBridge
where a transit RBridge encapsulated frame would be
decapsulated prior to forwarding. With respect to such an
interface, the local RBridge is the egress RBridge.
Each local RBridge will maintain a set of entries for at least
the following - corresponding to a subset of all possible
forwarding paths:
o Zero or more entries grouped for each ingress RBridge
sgai 31> why is it zero or more, if an RBridge exists, it must have a an
IRT, I haven't seen any discussion to support multiple IRTs.
- keyed
by the ingress RBridge identifier - used to determine
downstream forwarding of broadcast, multicast, and flooded
frames originally RBridge encapsulated by that ingress within
the CRED.
o Corresponding to each of these entry groups, one entry for
each of zero or more egress RBridge - where the local RBridge
is on the shortest path toward that egress RBridge.
Sgai 32> I don't understand this. Since the current proposal uses a
multicast MAC address, what is needed is a bit map for each IRT that
says which ports are blocking and which are forwarding. You are
basically building a ST using ISIS.
o Corresponding to each of these entry groups, one entry for
each of zero or more CRED egress interfaces.
Each entry would contain an indication of which single interface
a broadcast, multicast or flooded frame would be forwarded for
each (ingress RBridge, egress RBridge) pair.
Sgai 33> I don't get the pair.
Entries would also
contain any required encapsulation information, etc. required
for forwarding on a given interface, and toward a corresponding
specific egress RBridge.
Sgai 34> as a matter of fact each interface is basically a set of two
interfaces, a regular one and a tunnel one, and the forwarding/blocking
state may be different for the two.
A local RBridge could maintain a full set of entries from every
RBridge to every other RBridge, however - depending on topology
- only a subset of these entries would ever be used. In
addition, a topology change that changed selection of shortest
paths would also very likely change other elements of the
entries, negating possible benefits from having pre-computed
CFT-IRT entries.
Gray Expires April, 2007 [Page 15]
Internet-Draft RBridge Architecture October 2006
CFT-IRT entries should also include VLAN identification
information relative to each set of ingress RBridge, to allow
scoping of broadcast, multicast and flooding forwarding by
configured VLANs.
CFT-IRT entries should also include Multicast-Group Address
specific information relative to each egress RBridge that is a
member of a given well-known multicast group, to allow scoping
of multicast forwarding by multicast group.
Implicit in this data model is the assumption that the "shim"
header encapsulation will contain information that explicitly
identifies the CRED ingress RBridge for any broadcast, multicast
or flooded frame.
How the CFT-IRT is maintained will be defined in appropriate
protocol specifications used to instantiate this architecture.
The protocol specification needs to include mechanisms and
procedures required to establish and maintain the CFT-IRT in
consideration of potential SPF recomputations resulting from
network topology changes.
Sgai 35> this protocol must be designed to avoid transient loops, since
transient loops of multicast/broadcast cause broadcast storm that are
highly undesirable.
3.2.3. CTT
The CTT determines how arriving traffic will be encapsulated,
for forwarding to the egress RBridge, via the CRED. The CTT can
be considered a version of the LPT that treats the CRED, as a
whole, as another port. It becomes configured in much the same
way as the LPT: by snooping incoming traffic, and assuming bi-
directional consistency. The information is learned at the
egress RBridge and propagated to all other RBridges in the CRED
via the RBridge routing protocol. The CTT may be as large as the
number of nodes on the Ethernet subnet, across all VLANs.
RBridges may have separate CTTs for each VLAN, if separate VLANs
are supported by configuration.
Sgai 36> see my previous comment about VLANs
The CTT essentially determines the tunnel encapsulation used to
transport each specific frame across the CRED.
4. Functional Description
The RBridge Architecture is largely defined by RBridge behavior;
the logical components are minimal, as outlined in Section 3.
Gray Expires April, 2007 [Page 16]
Internet-Draft RBridge Architecture October 2006
4.1. CRED Auto-configuration
Cooperating RBridges self-organize to compose a single CRED
system. Consider first a set of bridges on a single Ethernet
link subnet (Figure 1). Here bridges are shown as 'b', hubs as
'h', and nodes as 'N'; bridges and hubs are numbered. Note that
the figure does not distinguish between types of nodes, i.e.,
hosts and routers; both are end nodes at the link layer, and are
otherwise indistinguishable to L2 forwarding devices. Bridges in
this topology organize into a single spanning tree, as shown by
double lines ('=', '||', and '//') in the figure.
N N---b3---N
| ||
| ||
N---h1--b4===b5==h2==b6
| // | ||
| // N ||
| // ||
N---b7====b8-----b9-----N
| |\
| | \
N N N
Figure 1 Conventionally bridged Ethernet link subnet
It is useful to note that hubs are relatively transparent to
bridges, both for traffic from nodes to bridges (h1) and for
traffic between bridges (h2). Also note that the same hub can
support traffic between bridges and from a host to a bridge
(h2), but that the spanning tree is exclusively between bridges.
Bridges are thus compatible with hubs, both as transits and
ingress/egress.
A CRED operates similarly, and can be viewed as a variant of the
way bridges self-organize. Figure 2 shows the same topology
where some of the bridges are replaced by RBridges (shown as 'r'
in the figure). In this figure, stars ('*') represent the paths
the RBridge is capable of utilizing, due to the use of link
state routing. RBridges can tunnel directly to each other (r4-
r5), or through hubs (h2) or bridges (b8).
Note that the former b8-b9 path, which is b8-r9 in Figure 2 and
had been disable by the hypothetical spanning tree in Figure 1,
sgai 37> disabled -> blocking.
is now usable.
Gray Expires April, 2007 [Page 17]
Internet-Draft RBridge Architecture October 2006
N N---b3---N
| ||
| ||
N---h1--r4***r5**h2**r6
* * | *
* * N *
* * *
N---r7****b8*****r9-----N
| |\
| | \
N N N
Figure 2 RBridged Ethernet link subnet
Every node in a CRED is considered to have a primary point of
attachment to the CRED, as defined by the Designated RBridge.
Each Ethernet link segment attached to a CRED has a single
Designated RBridge; that RBridge is where all traffic that
transits the CRED enters and exits. In Figure 2, it is easy to
see that the nodes off of h1 must attach at r4; the nodes off of
b3, however, attach at either r5 or r6, depending on which is
the Designated RBridge.
Without loss of generality, an RBridge topology can be
reorganized (ignoring link length) such that all nodes, hubs,
and bridges are arranged around the periphery, and all RBridges
are considered directly connected by their tunnels (Figure 3).
Note that this view ignores the ways in which hubs and bridges
may serve both on the ingress/egress and for transit, hence this
view is not useful for traffic analysis. Using this view, it is
easy to distinguish between RBridge to RBridge traffic and other
traffic on shared devices, such as h2 and b8, because RBridge to
RBridge traffic content is hidden from non RBridge devices by
the RBridge encapsulation.
Gray Expires April, 2007 [Page 18]
Internet-Draft RBridge Architecture October 2006
N N---b3---N
| ||
| ||
| h2
| /| \
| / N \
| / \
N---h1--r4***r5******r6
* * *
* * *
* * *
N---r7***********r9-----N
\ /|\
\ / | \
\ / N N
\ /
\ /
b8
|
N
Figure 3 Reorganized RBridge Ethernet link subnet
4.2. RBridge Peer Discovery
Proper operation of the TRILL solution using RBridges depends on
the existence of a mechanism for discovering peer RBridges and
the RBridge topology. An accurate determination of RBridge
topology is required in order to determine how traffic frames
will flow in the topology and thus avoid the establishment of
persistent loops in frame forwarding.
Sgai 38> for multicast/broadcast we also need to avoid transient loops.
The discovery mechanisms must use protocol messages which will
be propagated throughout a LAN (or broadcast domain) until they
are consumed by another RBridge. This must happen in order to
ensure that RBridges in the same broadcast domain are discovered
by their peers as required to allow for accurate determination
of RBridge topology.
These protocol messages should be distinguished in a manner that
is consistent with the chosen RBridge routing protocol, or any
other discovery mechanism used. It is very likely that peer
discovery will actually be done as part of the RBridge routing
protocol's peer discovery; however this is to be determined by
specific RBridge protocol specification(s).
Gray Expires April, 2007 [Page 19]
Internet-Draft RBridge Architecture October 2006
An RBridge intercepts protocol messages that it recognizes as
being of this type (peer discovery), performs any processing
required and forwards these messages as required by the
discovery protocol. For example, a receiving RBridge may first
determine if it has seen this message before and insert itself
in a list of RBridges traversed by this message prior to
forwarding the message on at least all interfaces other than the
one on which it was received.
Note that forwarding the modified message on all interfaces in
the example above is safe, if somewhat wasteful.
RBridges must forward all other protocol messages in a manner
consistent with L2 addressing and forwarding - as would be done
by a typical 802.1D bridge. This includes any frames of the same
type that are - for one reason or another - not recognized by
the receiving RBridge.
It is necessary for RBridges to forward unrecognized RBridge
control frames in the same way as they would other broadcast,
multicast or unknown unicast (flooded) frames, in order to
minimize the potential for interoperability problems with:
o future RBridge versions, using the same or similar control
frames
o non-cooperating RBridge implementations - i.e. - RBridges
that may be configured with different security information.
Note that forwarding unrecognized messages - even when of the
same (RBridge control frame) type - has the effect of providing
some degree of robustness in the solution against configuration
errors and against future variations of the discovery protocol.
Handling of 802.1D BPDUs is as determined in section 4.8.
4.3. Tunneling
RBridges pass encapsulated frame traffic to each other
effectively using tunnels. These tunnels use an Ethernet link
layer header, together with a shim header.
Specifics of encapsulation are to be defined in appropriate
protocol/encapsulation specifications.
It is the combination of the encapsulation that distinguishes
RBridge to RBridge traffic from other traffic. The link header
Gray Expires April, 2007 [Page 20]
Internet-Draft RBridge Architecture October 2006
includes source and destination addresses, which typically
identify the ingress and egress RBridges. For incoming multicast
and broadcast traffic, one of these addresses may represent the
multicast group or broadcast address. Additionally, these
addresses may be VLAN-specific, i.e., such that each ingress and
egress address have per-VLAN addresses.
The additional shim header is required to support loop mediation
for traffic within the CRED; traffic loops in forwarding between
RBridges and non-RBridge nodes, as well as across non-RBridge
devices between RBridges, is limited by loop mediation and/or
prevention mechanisms that are beyond the scope of this document
(but may include a TTL-like mechanism, mechanisms to establish a
loop free topology - such as STP/RSP - or both) on the
applicable LAN segments.
The shim header and encapsulation:
o must clearly identify the traffic as RBridge traffic - the
outer Ethernet header may, for instance, use an Ethertype
number unique to RBridges;
o should also identify a specific (egress) RBridge - the shim
header may, for example, include an identifier unique to the
egress RBridge;
o should include the RBridge transit route, a hopcount, or a
timestamp to prevent indefinite looping of a frame.
4.4. RBridge General Operation
Operations that apply to all RBridges include peer and topology
discovery (which may include negotiation of RBridge
identifiers), Designated RBridge election, link-state routing,
SPF computation and advertising reach-ability for specific L2
(MAC Ethernet destination) addresses within a broadcast domain.
In addition, all RBridges will compute Ingress RBridge Trees for
delivery of (potentially VLAN scoped) broadcast, multicast and
flooded frames to each peer RBridge. Setting up these trees
early is important as there is otherwise no means for frame
delivery across the CRED during the learning phase. Because it
is very likely to be impossible (at an early stage) for RBridges
to determine which RBridges are edge RBridges, it is preferable
that each RBridge compute these trees for all RBridges as early
as possible - even if some entries will not be used.
Gray Expires April, 2007 [Page 21]
Internet-Draft RBridge Architecture October 2006
The initial phase is the peer and topology discovery phase. This
should continue for a sufficient amount of time to reduce the
amount of re-negotiation (Designated RBridge and - possibly -
identifiers) and re-computation that will be triggered by
discovery of new peers. The timer values selected for delaying
the next phase should take into account the time required for
local STP and availability of segment connectivity between
RBridge peers.
Sgai 39> but RBridge discovery and STP are ongoing processes, why do we
want to couple their timers?
The next phase is election of Designated RBridges for all shared
access segments. This phase cannot complete before completion of
peer and topology discovery. In parallel, RBridge routing
protocol should begin the process of building the link-state
information - assuming this was not done during the peer and
topology discovery phase.
At about this time, RBridges should establish ingress RBridge
trees.
Once RBridges have established Ingress RBridge Trees, the
learning and forwarding phase may begin. In this phase, RBridges
initially forward frames by flooding them via Ingress RBridge
Tree(s). Also during this phase, RBridges begin "learning" MAC
address locations from local segments and propagating L2 reach-
ability information via the RBridge routing protocol to all
other RBridges. Gradually, the CFT will be built up for all
RBridges, and fewer frames will require flooding via the Ingress
RBridge Tree(s).
The learning phase typically does not complete as new MAC
attachment information continues to be learned and old
information may be timed out and discarded. Consequently, the
learning phase is also the operational phase. During the
combined learning and operational phase, all RBridges maintain
both Ingress RBridge Trees and a CFT. RBridges not elected as
Designated RBridge may be required to become one in the event
that the DR goes off-line.
4.5. Ingress/Egress Operations
Operation specific to edge RBridges involves RBridge learning,
advertisement, encapsulation (at ingress RBridges) and
decapsulation (at egress RBridges).
As described elsewhere, RBridge learning is similar to typical
bridge learning - i.e. - all RBridges listen promiscuously to L2
Gray Expires April, 2007 [Page 22]
Internet-Draft RBridge Architecture October 2006
Frames on a local LAN segment and acquire location information
associated with source MAC addresses in L2 frames they observe.
By convention, a Designated RBridge election always occurs. In
the degenerate case - where only one RBridge is connected to a
specific Ethernet segment - obviously that RBridge will "win"
the election and become the designated RBridge.
With this convention, only the Designated RBridge performs
RBridge learning for interface(s) connected to that segment.
As each RBridge learns segment-local MAC source addresses, it
creates an entry in its LPT that associates that MAC source
address with the interface on which it was learned.
Sgai 40> there is also a requirement to time-out learnt information to
maintain the filtering databases.
Periodically,
Sgai 41> periodically or on demand
as determined by the RBridge routing protocol,
each RBridge advertises this learned information to its RBridge
peers.
These advertisements propagate to all edge RBridges (as
potentially scoped by associated VLAN information for each
advertisement). Each edge RBridge incorporates this information
in the form of a CFT entry.
RBridges also discover that they are an edge RBridge as a result
of receiving un-encapsulated frames that require forwarding. If
an RBridge is the Designated RBridge for a segment, and it has
not previously learned that the MAC destination for a frame is
local (this will be the case - for instance - for the very first
frame it observes), then the RBridge would be required to
forward (or flood) the frame via the CRED to all other RBridges
(potentially within a VLAN scope).
The RBridge in this case would flood the frame unless it has
already created a unicast CFT entry for the frame's MAC
destination address. If it has a corresponding CFT, then it
would use that. This RBridge would be an ingress RBridge with
respect to the frame being forwarded.
The encapsulation used by this ingress RBridge would be
determined by the CFT - if one exists - or the CFT-equivalent
entry for the Ingress RBridge Tree. The encapsulation - as
discussed elsewhere - should include (in the shim header)
information to identify the egress RBridge (for example, the
RBridge identifier negotiated previously during the peer and
topology discovery phase).
Gray Expires April, 2007 [Page 23]
Internet-Draft RBridge Architecture October 2006
When the encapsulated frame arrives at egress RBridge(s), it is
decapsulated and forwarded via the egress interface(s) onto the
local segment.
Note that an egress RBridge will be the Designated RBridge on
the local segment accessed via its egress interface(s). If the
received frame does not correspond to a learned MAC destination
address at an egress interface, it will forward the frame on all
interfaces for which it is either the designated - or only -
RBridge. If the received frame does correspond to a learned MAC
destination address at an egress interface, the RBridge will
forward the frame via that interface only.
4.6. Transit Forwarding Operations
There two models for transit forwarding within a CRED: unicast
frame forwarding for known destinations, and everything else.
The difference between the two is in how the encapsulation is
determined. Exactly one of these models will be selected - in
any instantiation of this architecture- for each of the
following forwarding modes:
o Unicast frame forwarding
o Forwarding of non-unicast frames
o Broadcast frame forwarding
o Multicast frame forwarding
o Frame flooding
4.6.1. Unicast
In unicast forwarding, the shim header is specific to the egress
RBridge and MAC destination in the outer Ethernet encapsulation
is specific to the next hop RBridge.
As the frame is prepared for transmission at each RBridge, the
next hop MAC destination information is determined at that local
RBridge using a corresponding CFT entry based on the "shim"
header.
4.6.2. Broadcast, Multicast and Flooding
Ingress RBridge Trees are used for forwarding of broadcast,
multicast and unknown destination frames across the CRED. In a
simple implementation, it is possible to use the CFT-IRT entries
for all frames of these types.
Gray Expires April, 2007 [Page 24]
Internet-Draft RBridge Architecture October 2006
However, this approach results in potentially extreme
inefficiencies in the multicast and unknown destination flooding
cases.
As a consequence, instantiations of this architecture should
allow for local optimizations on a hop by hop basis.
Examples of such optimizations are included in the sections
below.
4.6.2.1. Broadcast
The path followed in transit forwarding of broadcast frames will
have been established through actions initiated by each RBridge
(as any RBridge is eligible to subsequently become an ingress
RBridge) in the process of computing CFT-IRT entries. Each
RBridge assumes that it may be a transit as well as an ingress
and egress RBridge and will establish forwarding information
relative to itself and each of its peer RBridges, and stored in
the CFT-IRT. CFT-IRT entries are computed at each RBridge for
paths going toward all other RBridges - at least in cases where
the RBridge performing CFT-IRT computations is on the shortest
path.
Forwarding information is in two forms: transit encapsulation
information for interfaces over which the RBridge will forward a
broadcast frame to one or more peer RBridges and a decapsulation
indication for each interface over which the RBridge may egress
frames from the CRED. In each case, the CFT-IRT includes some
identification of the interface on which a frame is forwarded
toward any specific egress RBridge for frames received from any
specific ingress RBridge.
Note that an interface over which an RBridge may egress frames
is any interface for which the RBridge is a Designated RBridge.
RBridges must not wait to determine that one (or more) non-
RBridge Ethernet nodes is present in an interface before
deciding to forward decapsulated broadcast frames on that
interface.
Forwarding information is selected for each broadcast frame
received by any RBridge (based on identifying the ingress
RBridge for the frame) for all corresponding CFT-IRT entries.
Each RBridge is thus required to replicate one RBridge
encapsulated broadcast frame for each interface that is
determined from CFT-IRT entries corresponding to the frame's
Gray Expires April, 2007 [Page 25]
Internet-Draft RBridge Architecture October 2006
ingress RBridge. This includes decapsulated broadcast frames for
each interface for which it is the designated RBridge.
Note that frame replication and forwarding should be scoped by
VLAN if VLAN support is provided. Also note that a Designated
RBridge (DR) may be required to transmit a decapsulated frame on
the interface on which it received the RBridge encapsulated
frame.
This approach for broadcast forwarding might be considered to
add complexity because replication occurs at all RBridges along
the ingress RBridge tree, potentially for both RBridge
encapsulated and decapsulated broadcast frames. However, the
replication process is similar to replication of broadcast
traffic in 802.1D bridges with the exception that additional
replication may be required at each interface for egress from
the CRED.
Note that the additional replication associated with CRED egress
may be made to exactly conform to 802.1D bridge broadcast
replication in implementations that model a CRED egress as a
separate logical interface.
Sgai 42> potentially there is an unencapsulated interface for each
physical interface of the RBridge. It is true that you can model all of
them as a single separate logical interface, but then we need to
replicate the frame according to a bitmask that tells on which physical
interface the RBridge is designated.
Using this approach results in one and only one copy of the
broadcast frame being delivered to each egress RBridge.
4.6.2.2. Multicast
Multicast forwarding is reducible to broadcast forwarding in the
simplest (default) case. However implementations may choose -
using mechanisms that are out of scope for this document - to
optimize multicast forwarding. In order for this to work
effectively, however, support for awareness of multicast
"interest" is required for all RBridges.
Without optimization, multicast frames are injected by the
ingress RBridge onto an IRT by - for instance - encapsulating
the frame with a MAC destination multicast address, and
forwarding it according to its local CFT-IRT. Again, without
optimization, each RBridge along the path toward all egress
RBridges will similarly forward the frame according to their
local CFT-IRT.
Using this approach results in one and only one copy of the
multicast frame being delivered to appropriate egress RBridges.
However, using this approach, multicast delivery is identical to
broadcast delivery - hence very inefficient.
Gray Expires April, 2007 [Page 26]
Internet-Draft RBridge Architecture October 2006
In any optimization approach, RBridge encapsulated multicast
frames will use either a broadcast or a group MAC destination
address. In either case, the recognizably distinct destination
addressing allows a frame forwarding decision to be made at each
RBridge hop. RBridges may thus be able to take advantage of
local knowledge of multicast distribution requirements to
eliminate the forwarding requirement on interfaces for which
there is no recipient interested in receiving frames associated
with any specific group address.
As stated earlier, in order for RBridges to be able to implement
multicast optimization, distribution of learned multicast group
"interest" information must be provided - and propagated - by
all RBridges. Mechanisms for learning and propagating multicast
group participation by RBridges is out of scope in this document
but may be defined in RBridge protocol specification(s).
Note that, because the multicast optimization would - in
principle - further scope and reduce broadcast traffic, two
things may be said:
o It is not necessary that all implementations in a deployment
implement the optimization (though all must support the data
required to implement it in RBridge peers) in order for any
local multicast optimization (consistent with the above
description) to work;
o Introduction of a multicast optimization will not result in
potential forwarding loops where broadcast forwarding would
not do so.
In the simplest case, the ingress RBridge for a given multicast
frame will re-use the MAC destination group address of a
received multicast frame. However this may not be required as
it is possible that the mechanisms specified to support
multicast will require examination of the decapsulated MAC
destination group address at each RBridge that implements the
optimization.
4.6.2.3. Flooding
Flooding is similarly reducible to broadcast forwarding in the
simplest (default) case - with the exception that a frame being
flooded across the CRED is typically a unicast frame for which
no CFT exists at the ingress RBridge. This is not a minor
distinction, however, because it impacts the way that addressing
may be used to accomplish flooding within the CRED.
Gray Expires April, 2007 [Page 27]
Internet-Draft RBridge Architecture October 2006
An ingress RBridge that does not have a CFT entry for a received
frame MAC destination address, will inject the frame onto the
ingress RBridge Tree by - for instance - encapsulating the frame
with a MAC destination broadcast address, and forwarding it
according to its local CFT-IRT. Without optimization, each
RBridge along the path toward all egress RBridges will similarly
forward the frame according to their local CFT-IRT.
Using this approach results in one and only one copy of the
flooded frame being delivered to all egress RBridges.
However implementations may choose to optimize flooding. A
Flooding optimization will only work at any specific RBridge if
that RBridge re-evaluates the original (decapsulated) unicast
frame.
Any flooding optimization would operate similarly to the
multicast optimization described above, except that - instead of
requiring local information about multicast distribution - each
RBridge implementing the optimization will need only to lookup
the MAC destination address of the original (decapsulated) frame
in its local CFT. If an entry is found, the frame could then be
forwarded only if the specific RBridge is on the shortest path
between the originating ingress RBridge and the appropriate
egress RBridge. This could be implemented - for example - as a
specialized CFT-IRT entry.
Note that, because the flooding optimization would - in
principle - further scope and reduce flooded traffic, two things
may be said:
o It is not necessary that all implementations in a deployment
support the optimization in order for any local flooding
optimization (consistent with the above description) to work
(hence such an optimization is optional);
o Introduction of the flooding optimization will not result in
potential forwarding loops where flooded forwarding would not
do so.
Because a forwarding decision can be made at each hop, it is
possible to terminate flooding early if a CFT for the original
MAC destination was in the process of being propagated when
flooding for the frame was started. It is therefore possible to
reduce the amount of flooding to some degree in this case.
Gray Expires April, 2007 [Page 28]
Internet-Draft RBridge Architecture October 2006
4.7. Routing Protocol Operation
The details of routing protocol operation can be determined once
a specific routing protocol has been selected. These details
would be defined in appropriate protocol specification(s).
Protocol specifications should identify means for determining
the content of the CFT, CFT-IRT and CTT.
4.8. Other Bridging and Ethernet Protocol Operations
In defining this architecture, several interaction models have
been considered for protocol interaction between RBridges and
other L2 forwarding devices - in particular, 802.1D bridges.
Whatever model we adopt for these interactions must allow for
the possibility of other types of L2 forwarding devices. Hence,
a minimal participation model is most likely to be successful
over the long term, assuming that RBridges are used in a L2
topology that would be functional if RBridges were replaced by
other types of L2 forwarding devices.
Toward this end, RBridges - and the CRED as a whole - could (in
theory) participate in Ethernet link protocols, notably the
spanning tree protocol (STP) on the ingress/egress links using
exactly one of the following interaction models:
o Transparent Participation (Transparent-STP)
o Active Participation (Participate-STP)
o Blocking Participation (Block-STP)
Only one of these variants would be supported by an instance of
this architecture. All RBridges within a single CRED must use
the same model for interacting with non-RBridge protocols.
Furthermore, it is the explicit intent that only one of these
models is ultimately supported - at least as a default mode of
compliant implementations.
This architecture assumes RBridges block STP.
Sgai 43> can we clarify that this means "drop BPDUs".
... snip ...
More information about the rbridge
mailing list