Post-TFD Segment Routing Roundtable Thoughts

My brain has now had a bit of time to recover from the information overload that was the Tech Field Day Segment Routing Round Table, so it is most definitely time to write a bit about what I learned. You may want to get a listen in on the Software Gone Wild Podcast with Ivan Peplenjak for a solid foundation of what SR is before jumping into things. After that, head over to the TFD YouTube channel to check out the recordings from the event. We had some really great presentations from Walmart, Microsoft, and Comcast, each of these companies explained how Segment Routing is helping them in their particular environment. I would start with the presentation from Mark Pagan of Walmart as it goes over a lot the real world day 1 benefits of SR. Then take a listen to the Microsoft and Comcast presentations, they really kicked it up a lot in terms of complexity of their overall solutions, but also really highlighted a lot of what is possible with Segment Routing.

I’m not going to try to write anything too technical about SR because I am definitely not enough up to speed on it to talk about it at that level. What I am going to do is jot down my view on it as a technology, and its applicability (in my mind I guess) in day-to-day network world. I also want to respond to my own thoughts/questions from my previous post before the TFD event.

I’ll try to address my own previous points first:

  • What ever happened to NSH: Guess I didn’t really get a solid answer here. As far as I can tell NSH is still technically a thing, but really seems to be fading away. I think ultimately its too big of a problem (or I guess solution) to really successfully implement. Somebody please chime in if there’s something new/interesting happening w/ NSH that I should be reading about. In any case, as compared to SR, they really are different beasts. I think there is some overlap in terms of what NSH was promising and what SR can do. Sure SR can direct traffic through a network, and maybe even to or through some devices on the network but it’s not intending to do “service-chaining” in the same way that NSH was/is.
  • Config nightmare: Nope – think that was my biggest takeaway is that SR is pretty much MPLS 3.0. I mean 3.0 because it is just that much simpler, not only to configure, but to troubleshoot as well. I bring up troubleshooting since I think this is/was the biggest and most important part of the whole event - SIDs (Segment IDs) are globally significant. Sounds not very exciting/important by itself eh? Well the reason I think that is so huge is if you’ve ever worked w/ MPLS and you are troubleshooting and trying to understand the end to end label switch path (LSP) then you will know that the labels are all over the place and are significant to the local router – now they are unified across the whole LSP… that’s pretty badass! I should also note that instead of using LDP, SR distributes tag data via TLVs in OSPF or IS-IS, kinda sorta like LDP auto-config.
  • Granularity/Service Chaining: I think you can do some of this with SR, but it’s really not its intended use case – a bit more on this later.
  • Isn’t MPLS dead?: Heh… yes? No? Obviously it’s not dead in the service provider world, and likely won’t be for a long long time. In the data center… mostly dead is maybe fair? I can say I personally don’t see much/any MPLS in the DC at least. I think that part of why I was bringing this up before was because I was thinking more about an Enterprise DC (as that’s my day-to-day focus). I think you could absolutely use SR in an enterprise DC but I don’t think it’s really the best tool for that job. If you take a look at who presented though, you’ll see that while these are “Enterprises” (well plus Comcast as a Service Provider), but they’re freaking huge, and they’re really their own SPs doing SP type things. (MS is using this in the DCs but in a very hyperscale/SP type way)

Alright so I guess that addresses the points from my previous post, now on to a bit more wordy words to recap my thoughts on SR.

I feel like SR is kind of no-brainer in the SP/WAN world, it really does just seem like a way better way to do MPLS. You’ll still have to layer “stuff” on top of the SR bits (vpnv4/6 address family type stuff or whatever it is you’re running atop your MPLS), but SR just makes the rest seem so trivial. TE just got owned also… seems like there is basically no point to TE as we know it today if you can just use SR-TE to make your life so much easier. All that is well and good but, I don’t live in a provider-centric world really, I focus on data centers so…

While I am now a fan of SR, I feel like it doesn’t have a place in the data center. I know that the folks working on it will probably disagree, and I would like to agree with them but I can’t at this point. The current biggest challenge in the data center (at least at normal enterprise scale) is we still have to have L2 in some capacity. This is a super super super lame requirement, but it is what it is. This requirement is the reason we have jenky spanning-tree kludges (MLAG, vPC, VSS, etc.), FabricPath/TRILL, OTV, VPLS, and now VxLAN. Now from what I understand, there isn’t technically any reason you couldn’t use SR w/ some AToM or VPLS (maybe PBB?) to provide L2 over L3 in the data center, but that sounds like a freaking headache. VxLAN has pretty much won the DC overlay wars, and I don’t see any reason to introduce SR into the DC. Between data centers SR certainly could have a role in providing transport services, or even L2 extension, however even then as VxLAN continues to mature and grow into that role it doesn’t feel like its worth it to tack on another protocol/feature to support that requirement. If SR was the panacea for service chaining that I was kinda hoping it would be, then perhaps I’d feel differently. So at this point, given our stupid “requirements” for L2, I think SR should/will likely stick to the WAN/hyperscale folks. Theres nothing wrong with that of course, but I do feel like its important to delineate where SR is best suited (at least in my mind!).

PS - Go watch Paul Mattes presentation (Microsoft), they’re using link-bandwidth in BGP which has always seemed to me to be the best kept secret of BGP. I was very exited to hear they’re really taking advantage of it in production, I rarely see it so that was fun. /end nerdgasm