[rbridge] long-awaited review comments ondraft-ietf-trill-rbridge-arch-05
James Carlson
james.d.carlson at sun.com
Thu May 15 14:15:07 PDT 2008
Eric Gray writes:
> While I personally agree with most of your comments, I do
> not believe that I am at liberty to make changes along the lines
> of many of the ones you suggest, because the existing text is the
> result of WG direction or prior consensus/decisions.
To be clear: I wasn't asking you (personally) to make any changes
without working group consensus.
During the last IETF meeting in Philadelphia, there was a call for
volunteers to read the document and offer comments on the list,
because the document shouldn't go forward if it hasn't had review. I
offered my time up to do that review, and as a result, I was offering
comments on the document.
> There were many reasons why I had initially made the "DRB
> assumption" and you have pointed out several of them. However,
> I was given very explicit direction by the WG to remove the DRB
> assumption, which ultimately made it necessary to describe all
> of the specific complications involved in doing anything else.
Yep; understood. If the consensus is to leave the architecture
document wishy-washy due to high level equivocation like this, then I
can certainly live with it. I'm implementing to the protocol specs
rather than the architecture anyway. ;-}
> > This is eventually described in section 5.5, but that's quite a ways
> > down in the document.
>
> There are usually structure issues with any document. Each different
> reader may find that they would prefer that some part of the content of
> the document was provided earlier than another. This is not always an
> option.
When attempting to read the document as someone without intimate
RBridge knowlege, I think the very *first* question that reader has is
how traditional bridging and the new RBridges "see" each other. How
does interoperability with existing equipment work?
In every single presentation I've given on our OpenSolaris RBridges
project, that question has come up. In fact, I just gave a short talk
at our internal quarterly engineering status review today, and guess
what the only question was?
> For example, if the interaction information you requested above were to
> appear toward the beginning of the document, it would be necessary to
> either assume the reader understands elements of the architecture that
> are implicitly included in these statements, or provide the description
> of these things earlier in the document. Pretty soon, you may find all
> of the information gets cycled and the specific content you wanted to
> see earlier is now pretty much where it was orignally.
I don't believe that to be true, because I'm identifying a need to set
out some boundary conditions first: the reader should not expect to
see us discussing STP interoperability (there isn't any because the
STP 'domain' ends at the RBridge doorstep) or how RBridge links are
disabled by STP (they're not).
But if the document authors don't feel they can do that without
damaging the structure of the document, then that's fine. I've
offered my comments, and that's all I set out to do.
> Would it be sufficient to provide a pointer to section 5.5 at some point
> toward the beginning of the document? If so, where specifically would
> you think it should be put?
Having a pointer that says "interoperability between traditional
bridges using Spanning Tree and RBridges is preserved by the rules
described in section 5.5" somewhere near the top (perhaps before or
amid the "as an overview, however" part on page 5) would help.
> > -
> >
> > Section 2.2 page 11:
> >
> > The term "R-tree" is defined, but then never used again.
>
> This was specifically agreed to in the course of terminology alignment
> several meetings (and versions) ago. Radia and Silvano requested that
> the architecture document should include this definition. However, I
> argued at the time that this is specific to a particular specification
> of protocol, and not a term generally applicable to the architecture.
>
> Hence the term is defined, and not used.
It looks like flotsam in the architecture document. I suggest
removing it.
> > -
> >
> > 3.2.1 doesn't give enough detail about the nature of the unicast
> > forwarding database. There must be entries of at least these forms:
[...]
> The exact proposed additions are actually not quite complete/accurate
> since the ingress and forwarding operations are 1) distinct (since the
> ingress operation also includes encapsulation) and 2) discussed in
> different sections (ingress information is discussed in section 3.2.3
> and an ingress RBridge would contain - at least logically - an ingress
> forwarding database, a unicast forwarding database and, possibly, a
> multi-point forwarding database).
>
> At one point, the architecture did contain information at the level of
> detail suggested in this comment. However, this level of detail was
> found to be objectionable by a number of people in earlier versions.
In *no* place does the document describe the form or abstract
operation of these databases required for the forwarding functions.
That's important, and it is indeed architectural. It's not an
implementation detail that can be deferred to a protocol
specification. The database itself doesn't appear inside the
protocol. It's a fundamental assumption of the mechanism used in
forwarding and it's necessary to understand it in order to understand
egress learning behavior.
(Honestly, given the amount of architectural information that's
present in the protocol specification, I'm not sure why I should argue
about this point -- collapsing the documents would be a better
solution -- but as long as we're trying to separate architectural
matters from protocol matters, we're not quite getting there.)
> 'The Unicast TRILL Forwarding Database contains data specific to
> RBridge forwarding for unicast traffic. The specific fields
> contained in this table are to be defined in RBridge protocol
> specifications. In the abstract, however, the table should
> contain forwarding direction and encapsulation associated with
> an RBridge encapsulated frame received - determined by the TRILL
>
> "shim" header destination and VLAN (if applicable).'
That's exactly what I'm pointing out as insufficient in terms of
architecture.
> > -
> >
> > Section 3.2.2., on page 15, the term "Egress RBridge" is defined for
> > the multi-destination case in part with this text:
[...]
> This comment is somewhat misleading.
I don't believe it is.
> Egress RBridge (as a role in a network) is defined in the terminology
> section.
>
> Immediately preceding this (and other) definition(s) is the following
> paragraph:
>
> 'In discussing entries to be included in the Multi-destination
> TRILL Forwarding Database, the following entities are
> temporarily defined, or further qualified:'
Right. That's exactly what I'm pointing out as difficult to
understand. Using the same term with different definitions within a
single document is dangerous.
> This is an editorial choice, and should not produce confusion for most
> readers.
I guess I'm just not "most readers," then.
> Perhaps it would be potentially less confusing if the entire set of
> "qualified" definitions were included in a NOTE in which they were
> also made to look significantly less like "definitions"?
Possibly, though I'm not sure how that would function.
The higher level issue that I'm pointing out is that you're using
English as though it were a programming language -- complete with
clear lexical scoping rules and unambiguous syntax. Unfortunately,
written prose isn't like that, even in a specification, and
particularly in an IETF document.
The essence of a good RFC is clarity. It's the ability to produce
interoperable implementations by multiple vendors and by multiple
readers who may be separated by time, distance, and native language.
I don't believe it's wise to sacrifice clarity for the sake of
precision or economy.
> > I think the text should be shortened up considerably and clarified,
> > because this point is effectively drowned out by too many words.
> > (This comment applies to similarly affected sections, such as 5.2,
> > which seems to be crawling with degenerates. ;-})
>
> Unfortunately, I agree with this comment completely and cannot take any
> action on it - at least not on the basis of a single comment from one
> reviewer.
I guess my time was well spent. ;-}
> Also, as you undoubtedly know, "degenerate" is a precise mathematical
> term meaning "a limiting case of a mathematical system that is more
Of course.
> symmetrical or simpler in form than the general case." My favorite
> example is a degenerate circle represented by the equation -
You might have noticed that the phrase "degenerate case" is used twice
in this section and appears to refer to two subtly different cases.
> > The *important* part is whether any equipment that may form a
> > non-RBridge L2 data path between RBridge ports must allow TRILL
> > communication between those ports such that RBridges can safely
> > elect or determine a single Designated RBridge. It doesn't matter
> > how that path is formed (802.1D is one possibility), just that it
> > exists.
>
> I won't argue whether or not this true, it is not quite the point.
>
> The issue discussed is entirely about the need to be compatible with
> bridge learning (defined for 802.1 bridges). If - in fact - the issue
> was limited to links between RBridges, my answer might be different.
>
> If we do not need to be consistent with bridge learning (in either a
> transit or ingress/egress case), a lot of things are different. But
> the key difference - for this section - is that it is important for
> RBridges to provide forwarding that is consistent with the way that
> bridge learning works. In the simplest approach - where we treat a
> set of (spanning tree) connected bridges as a single link between
> RBridges (or a single stub connected to a single RBridge), and have
> a single RBridge that provides egress/ingress to the RBridge campus
> - then the specifics of the topology and the way that bridge learning
> occurs would be unecessary.
I'll offer that injecting explicit end-station forwarding entries into
the TRILL database (one of the options that's discussed multiple times
in the text, along with all of its disadvantages) is *NOT* consistent
with traditional bridge learning, so it's apparent that strict
consistency with 802.1D or 802.1Q isn't the point, either.
> This may not be precisely clear to people in the IETF, generally. It
> is probably not the case that everyone immediately realizes that the
> frames delivered to an ingress RBridge do not (usually) have the MAC
> DA of that RBridge, nor that frames delivered from an egress RBridge
> do not (usually) have the MAC SA of the egress RBridge. Because these
> things are true, however, it is necessary for the RBridges to behave
> in a way that is consistent with bridge learning.
It would be a bizarre bridge indeed that requires the end stations to
somehow "know" the local bridge MAC address in order to set the MAC DA
correctly, or that alters the MAC source address of the original frame
in transit. I don't see how that could possibly work right. What
you're talking about above seems to be the behavior of a router, not a
bridge.
I expect RBridges to be (first and foremost) layer two bridges. I'm
surprised that we have to talk about consistency with 802.1D or 802.1Q
learning, as the learning that RBridges must do is actually somewhat
different (particularly the tunnel egress portion), and because the
learning that other bridges do (or don't do!) is completely invisible
to any RBridge implementation.
Even if we were somehow completely inconsistent with those other
documents, I fail to see how it would be an architectural issue for
RBridges, which _must_ be able to stand on their own.
> > This latter bit is crucial. It's what requires the encapsulator
> > (which fills in a source nickname) and decapsulator (which will be
> > the target of return traffic) to be the same node, or at least
> > requires the encapsulator to fill in the decapsulator's nickname as
> > the "sender."
>
> This is - IMO - a level of detail below architecture. In particular,
> the mechanistic details of RBridge learning very definitely do not
> belong in an architectural specification.
I don't agree. It's a crucial bit of the architecture. It's how the
learning function operates, and it's one of the things that
distinguishes RBridges from regular bridges.
The document goes to the trouble of describing learning based on
source MAC address on the sender (encapsulator) RBridge, but then
leaves out how the destination (decapsulator) RBridge learns the
reverse path. That seems like a hole to me.
> In addition, the proposed additional text is actually architecturally
> incorrect.
>
> For one thing, the choice to learn on decapsulation is the current
> approach assumed as default in the protocol specification but is not
> an architectural requirement at all.
If you do data-based learning, rather than injecting the end station
addresses into the IS-IS data, then you *MUST* do as I outline above.
There's no other option other than flooding everything, and that's not
reasonably viable.
> > This is duplication of the information already in 3.2.3. This could
> > be trimmed down.
>
> I am unsure how this information is in any way related to the text in
> section 3.2.3 (Ingress TRILL Forwarding Database).
The two sections, one after the other, are almost word-for-word
identical. This is a comment on the layout of the text.
> In addition, this is a single sentence statement that is very relevant
> to the text in surrounding paragraphs. The preceding paragraph talks
> about the fact that the current assumption (of the WG, in the protocol
> specification referenced) is that the ingress forwarding database is
> populated by learning from (potentially flooded) data frames on egress.
Except, as you've pointed out before, that's a "detail" of the
protocol implementation ... right?
> > The architecture document should describe how the system is intended
> > to operate and what the parts should do. I don't see a reason to
> > insert loopholes that allow for unspecified future variations. At
> > best, it's a distraction, because we don't know how to make that
> > work. (And, in fact, I suspect it does _NOT_ work in any case,
> > because it breaks learning.)
>
> It's interesting that you did pick up on the learning problem, but did
> not see that this was the point of a lot of the text in section 4.1.
> Minimally, I will try to make at least that much clearer.
>
> Again, there is no "architectural" reason to restrict implementations
> - or protocol specifications - to behavioral assumptions such as the
> ones you suggest. This was not my choice, but direction from the WG.
The architectural reason to do so is that it allows the receiver (the
egress RBridge) to *assume* that the sender's nickname is the reverse
path for the source MAC address in the decapsulated packet.
If this assumption is broken, then there's no way to do this right, as
long as the architecture allows for data-driven learning. The section
of text I'm commenting on is this:
Note that an egress RBridge will - in most case - be the RBridge
determined to be the primary point of attachment for a
destination end station on the local link or VLAN accessed via
its egress interface(s). Exceptions to this might exist under
circumstances in which use of distinct RBridges for ingress and
Note that it says "in most case" [sic]. It's allowing for egress
RBridge and ingress (PPOA) being different, and I don't think that's a
viable architecture.
> And - unlike the ATM suggestion you mention - this particular degree of
> "architectural freedom" has everything to do with Ethernet technologies
> RBridges are expected to be compatible with. As an example, consider
> the following scenario:
> _____
> S-1 <------| |------> RB-1
> S-2 <------| H-1 |
> S-3 <------|_____|------> RB-2
>
> In this case, S-1, S-2 and S-3 are end stations, H-1 is a Hub, and
> RB-1 and RB-2 are RBridges. While I am not recommending it, there
> is no "architectural" reason to prohibit RB-1 from being a PPOA for
> S-1 and S-3, while RB-2 is the PPOA for S-2.
Agreed. That works fine. That's not the problem I'm pointing out.
If RB-2 is the PPOA for S-2, and encapsulates the packets for S-2, but
the RB-2 is unable or unwilling to be the decapsulation point, how
does it know to put RB-1's nickname into the TRILL header?
That's what happens if you allow a protocol implementation to split up
the encapsulation point from the decapsulation point: there has to be
some way for the encapsulator to present the right TRILL nickname as
the source such that the return packets will be sent to the
decapsulation point. The architecture contains nothing that would
allow this to happen in any obvious way. (Or, really, any reason to
bother supporting such a thing.)
In other words, I see this as just a needless complication.
(It's possible that you're trying to describe a transient condition
that may occur during re-election of a new DRB. If so, then perhaps
it should be made clear that although this *could* happen, it's not
the expected long-term behavior of the system.)
> I do not deny that this is confusing. Those WG members who read any
> of the first 4 or 5 versions (at least), would be able to point out
> that I had tried to avoid this confusion by assuming that a DRB will
> be used.
My comment in this section wasn't actually about the DRB selection.
It was instead about split encapsulation/decapsulation by a single end
station.
> > I'm surprised that section 5.4 doesn't discuss why IS-IS was chosen,
> > or what special things need to be done with it in order to make it
> > work here (such as setting a fixed "area" value).
>
> There is no completely unarguable reason for making the choice to
> use IS-IS. Also, the basis for making the choice is irrelevant to
> the architecture specification. The fact that the choice was made
> is only included as a result of a process decision to discontinue
> efforts to make progress on another document that was actually the
> appropriate place to make such observations.
Just as the choice to use IS-IS is architectural, the high-level
changes that must be made to IS-IS in order to make it usable as the
chosen solution (i.e., how well it fits the fundamental requirements)
are *also* architectural issues.
If you don't want the fit-to-function of IS-IS to be architectural,
then I think the choice of the routing protocol needs to be ruled out
of scope for this document as well. (No, I don't think that's a good
idea at all, but I don't see how the architectural choice of IS-IS as
a basis can be made off-hand without discussing the implications.)
> If you would care to propose specific text - and we can achieve a
> degree of WG consensus - as to why IS-IS was chosen, I would be
> okay with adding it.
>
> The reason I ask is that I am not certain that the real reasons
> why IS-IS was chosen are both appropriate for inclusion in, and
> consistently likely to remain true over the life of, an RFC.
For one thing, it runs directly on the link layer, and can be
specified such that no user configuration is necessary to make it
work. In contrast, the alternatives (such as OSPF) run on network
layer protocols that require explicit configuration of subnets and the
like.
> With respect to setting a 'fixed "area" value' people who've been
> to most of the meetings will probably recall that I asked about
> this and was told that it was not the case. If this has in fact
> changed, I was not told about it.
If that's not done, then how do we ever achieve our zero-configuration
goal?
This seems obvious to me, and it's something we've discussed several
times in the working group. It's a point on which I believe we
already have substantial consensus.
> Can you provide a specific reference? There is no instance of
> either the word "area" or the word "fixed" that has anything to
> do with this topic in the protocol specification. The word "area"
> is mostly used in connection with the "options" portion of the
> TRILL header, and the word "fixed" is used with enablement status
> and the assignment of VLAN 1.
See draft-eastlake-trill-rbridge-isis-00, section 2.1, bullet item 3:
3. TRILL uses a distinct, constant IS-IS Area Address that would
never appear as a real Layer 3 IS-IS area address. This Area
Address is the value zero. (See Section 4.1.)
> Also, is it really appropriate to include this level of detail in
> the architecture document? This sounds like a protocol operation
> specification...
The use of a fixed area is architectural. The exact area number and
the optional features that might be related to it are not.
> > This also looks like material that's in the same category as the 5.2
> > advice about separate ingress/egress. It's possible that someone
> > could define a "new" version of RBridges that either forwards STP
> > messages (!) or has each RBridge acting as an STP node in a single
> > network (!!), but neither of those is really the solution we're
> > trying to describe. It's not part of the architecture.
>
> An architecture does not describe a solution. This should be clear
> from the title, which is intended to indicate that this document
> describes the architecture that applies to a solution to the TRILL
> "problem." The abstract further clarifies that the role of this
I emphatically disagree with that position.
First of all, we already have an abstract problem statement. It's
described in draft-ietf-trill-prob-02.
That's not something I believe this document needs to do. Instead,
this document needs to describe the architecture of a solution to the
problem statement. "Architecture," in this instance, means the
components, interactions, and behaviors that will be and *should be*
combined to produce the desired results. The problem itself isn't
architecture, as far as I understand it.
That necessarily describes a solution, and I don't see how we can have
an architecture document that doesn't actually address a solution. If
that's what you and the other wg participants really want, then please
leave my name off as a reviewer. That's a dead end to me.
This sort of all-and-the-kitchen-sink approach, where we don't make
actual decisions in this working group, but instead allow for a range
of purported but never realized "options" is _exactly_ what will lead
to interoperability problems and a failure of the group to specify
something useful.
I fear a lack of specificity much more than I do "offending" one or
more working group participants who don't want to have a simple and
unambiguous document. Pretending as though we'll design something
that will work utterly differently with regard to STP is, in my
opinion, a terrible mistake. We're on better ground when we rule
_out_ useless possibilities than when we make unending allowances for
them.
It's always easier to add options and features in the future than it
is to rule out mistaken features that were added just for the sake of
flexibility.
> The fact that these terms are not defined (at least in section 2)
> is that there was no consensus in a terminology alignment effort to
> include these terms. In addition, it is (as you said) somewhat
> obvious what these terms mean (though I disagree that knowing much
> about RBridges is required; it is actually more important to know
> a little bit about STP).
I think the reader must actually understand *both*.
> Further, it seems somewhat pretentious to define terminology in the
> main architecture text that is only used in a tutorial subsection
> of the document. Understading of this text is "nice-to-have" but
> not necessary.
Pretension or not, I don't see how defining the terms nowhere but
using them regardless is a solution to that particular problem.
> If you look into the "wiring-closet" problem, you will see that it
> is possible (with some potential solutions to this problem) to have
> STP race conditions, or oscillations, depending on exactly how the
> protocols interact.
I did look into the problem, but I don't see any such modes.
If TRILL starts up first, then we end up with separate routes. If
that STP link comes up later, then we'll have a temporary loop
(mitigated by TRILL's hop count) until the two RBridge neighbors
discover each other, and establish a new single DRB.
In the other order, TRILL will immediately detect the path, and set up
a single DRB as expected.
In either case, the situation is then stable. TRILL does *not* cause
STP to recalculate anything. TRILL doesn't shut down links, which are
the only things that STP really cares about here.
There's no oscillation, because there's no path that allows for
feedback -- a necessary condition of oscillation. There is
potentially a *transient* as the topology settles, but the system is
dynamically stable.
> Since the WG is apparently not interested in solving that problem
> - in spite of feedback from IEEE 802.1 that doing so is important
> - there is little reason to go into the details. This is even more
> true, given the fact that section 5 is now a tutorial.
I think it's an interesting problem to solve, and worth mentioning,
but far from the threat that you seem to be describing.
In particular, network reconfiguration can by itself create such a
temporary loop without ever needing to invoke the "wiring closet
problem." That's why the TRILL headers will have hop counts. It's in
the nature of the beast when you start to deal with routing protocols.
I don't doubt that such things make 802.1 folks queasy.
> > Note that
> > scaling concerns may dictate otherwise, either in specific of
> > ^^^^^^^^ ?
> > RBridge protocol specification, or in deployment.
>
> I had noticed this one as well. It either originally said, or was
> meant to say, "... specific instances of ..."
>
> I propose to remove "specific of", making the text read "... either
> in RBridge protocol specification, or in deployment."
OK.
> - with -
>
> "Entries contain an indication of the interface a broadcast,
> multicast or flooded frame is forwarded on for all applicable
> {root RBridge, egress RBridge} pairs."
OK. The rest is still in future tense, but that corrects the odd jump
in mood.
> I propose to fix the sentence structure by replacing the first
> sentence with:
>
> "The Ingress TRILL Forwarding Database is used to determine
> how arriving traffic will be encapsulated for forwarding,
> toward the egress RBridge, via the TRILL Campus."
OK.
--
James Carlson, Solaris Networking <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
More information about the rbridge
mailing list