Moving on

Today is my first day at Facebook. I’m joining the Safety and Security team in Facebook’s DC engineering office.

After just over three years at Brocade and longer working on OpenDaylight, it’s a big change and not a decision that I came to lightly. Let’s keep it short here through. My time at Brocade was amazing. I have learned so much about how to take open source, productize it, sell it, support it, and make money doing it. The team is incredible and will keep being incredible. I wish them all the best.

OpenDaylight is going strong—serving over a billion end users in production and growing. Working with the community will likely be the most rewarding thing I’ve done for some time to come. While my day job at Facebook won’t involve OpenDaylight, I do intend to remain part of the community insofar as that is possible.

The opportunity at Facebook lets me pursue another one of my long-time passions: the intersection of society, law, technology, and (inevitably) politics. I almost went to law school instead of getting my PhD in computer science. Even during my PhD, I helped run a society and technology (SocTech) seminar bringing engineering, law, sociology and other students together around different issues each quarter.

Working on hard problems at the edge of these areas where it directly impacts the over 2 billion people who use Facebook and indirectly shapes how humanity connects is simply too good to pass up.

Please don’t be a stranger. Find me on Facebook, Twitter, in D.C., and wherever else. I’m sure I will cross paths with most of you and meet many new people as I move on.

on Security in OpenDaylight

It’s now been a bit more than two months since OpenDaylight dealt with the the “netdump” vulnerability reported in August. The good news then was that we fixed the vulnerability and we were able to fix it and ship a new release of ODL with the fix in four days once we knew about the vulnerability. I want to echo Dave Meyer’s comments in saying just how impressive that is and how well the OpenDaylight community comes together when something needs to be done. The list is much longer than this, but in particular, Robert Varga and David Jorm were absolutely critical in pushing things through quickly and efficiently.

The bad news then, was that there was about a 4.5 month lag between when the vulnerability was discovered and and when we found out about it. However, the even better news now (and really this all happened over a month ago, but I haven’t had time to blog about) is that we have a bunch of new things in place that will prevent that kind of lag in our responding in the future. Some of them have even been covered elsewhere.

Better Publicized Ways to Report Security Issues

Even at the time netdump was first made public, we had a private security mailing to report security issues, but it was unfortunately not very well-advertized. Today, it’s publicly listed on OpenDaylight’s contact information page. It’s also listed on our new security advisories page and you can find both on the first page of results for “opendaylight security” at your favorite search engine. For good measure, I’ll also put it here: to report any security issues in OpenDaylight, please e-mail

Formal Security Response Process

Again, we’ve had a long-standing security response team in ODL who monitored our security mailing list, but we didn’t have a complete understanding of what we needed to do next when a vulnerability was reported. Now, we have a very clear idea of what happens, who’s responsible for driving things once a vulnerability is reported, how to work with developers to create a fix, how to release the fixed version of OpenDaylight, and how to let our users know in a timely fashion.

More Good Things are Coming

We’ve already dealt with our second disclosed vulnerability, so we know the process works and we’re learning how to make it even better as we deal with each incident. We’re also actively working to take up a broader array of security issues with a combination of code changes and the development of security best practices for deployments as part of the Lithium release of OpenDaylight.

Security in Open Source

Lastly, I’d be remiss if I didn’t mention that open source software has a huge set of advantages when it comes to security. First, since the code is open to anyone, anyone can come find vulnerabilities and report them. Second, you can draw on a wide array of experts and developers across companies to discuss and fix any vulnerabilities that are found. Third, the community at large can see how such issues are addressed transparently and understand if the issue has really been fixed. All of this is made easier and more robust because, in open source, a community spanning companies can collaborate transparently.

on centralization in SDN and the applicability of OpenFlow

There have been two recurring themes I’ve heard recently around why SDN and OpenFlow don’t make sense. I’m going to pick on Ivan at, but that’s just because he’s put the arguments out there in the most digestible form I’ve seen. There are lost of other places I’ve seen heard the same thing.

The two themes are:

  1. Centralization doesn’t scale and won’t work in real networks. (See this.)
  2. OpenFlow is a poor fit for use case X. (See this about overlay networks.)

Honestly, Rob Sherwood does a great job of deflecting both of these when he discusses how modern OpenFlow fully exploits hardware and modern SDN controllers can provide better scale-out and fault-tolerance than traditional networking.

However, here, I’m going to talk about my take on them and how both of these mistaken assumptions are actually symptoms of a broader problem in how we think about networking—namely how we fail to build clean layers in network solutions.

Logical Centralization in SDN

As everyone loves to point out, the most common definition of SDN is a logically centralized control plane that is separate from the data plane and open protocols to govern the interaction between the two.

I’d like to call attention to the word “logically” in that statement. It’s where both how we build SDN control planes gets tricky and where the claim that centralization doesn’t scale loses it’s validity.

As Greg Ferro points out:

OSPF is a distributed network controller. It does the configuration of the forwarding table from the control plane. Your welcome.

Grammar mistakes aside, he has a point. Almost all control plane protocols (OSPF included) try to provide some degree of logical centralization while actually being distributed so that they are fault-tolerant and can scale.

The key difference between legacy protocols and SDN controllers is that most legacy protocols have one model of where to draw that line between centralized and distributed and it’s baked into the hardware. When you have SDN controllers, you can choose what point you want anywhere from a single central controller to having an instance of the controller running for every device in the network.

Further, you can make that decision different ways for different parts of your network. For example you can have fully centralized traffic engineering rerouting elephant flows while having fully decentralized basic routing. This is more more or less what Google does in their B4 WAN network.

I gave a talk about how we should think about this—without too many presupposed solutions—at the first OpenDaylight summit. The talk video and slides (with good references at the end) are both public.

OpenFlow is a Bad Fit for Task X

There’s two parts to this. First, OpenFlow doesn’t actually fit networking hardware. Second, even if it does, it’s a kludge to implement higher-level features with it.

OpenFlow Doesn’t Work on Real Hardware

The first one is a pet peeve of mine because, I helped the ONF’s Forwarding Abstractions Working Group (FAWG) figure out how to make OpenFlow 1.3 and later be much more hardware friendly. We wound up defining something called Table Type Patterns (TTPs). This included being the inspiration for Broacdom’s OF-DPA that provides a 10 table OpenFlow 1.3 abstraction for their modern switching ASICS.

I should give all the credit for TTPs to Curt Beckmann and the rest of the FAWG as I got distracted by OpenDaylight pretty early on and have only recently got re-involved as part of an OpenDaylight project to support them.

Curt Beckmann and I gave a talk about some of this at the OpenDaylight summit—there’s a public video and slides.

Joe Tardo and others gave a talk about the Broadcom OF-DPA at the 2014 Open Networking Summit, which you can find in the video archives by finding the Developer Track talk counter-intuitively titled “Floodlight: Open Network Hardware Programming”.

Long story short, OpenFlow 1.0 was hard to make work on real hardware. OpenFlow 1.3 can be mapped to real hardware just fine, but takes some effort to define the right set of tables. The Forwarding Abstractions Working Group’s TTPs and Broadcom’s OF-DPA show how to get this done.

OpenFlow is a Bad Fit to Implement X

Ivan points out that there are simpler things that OpenFlow that might provide a way to build overlay virtual networks. This is always going to be the case. For example, x86 assembly language seems like a really crappy abstraction to start with to provide video playback, but it’s where we started and then we provided higher and higher levels of abstraction until we could provide a function that was basically “play this file”.

Similarly, OpenFlow isn’t the most natural fit for high-level tasks, but that’s kind of the point. We need layers of abstraction that sit between high-level tasks and the low-level way the they are implemented.

Taking away general-purpose, low-level access and thus reducing how we can reuse and remix underlying network functionality is exactly what we’ve been getting wrong when we provide purpose-built hardware and software/firmware. The whole part where we’ve mistakenly baked one trade-off point in the centralized-distributed point into our hardware is just another example of this.

The Bigger Problem

The bigger problem we have in networking is that we can’t seem to figure out how to provide layering for our solutions. Instead pick particular full stacks across the layers and associated design decisions (like trade-offs in centralization-distribution) and bake them into our solutions and hardware.

Instead, we need to start to actually provide pluggable elements at each layer and make them as open as we possibly can. OpenFlow is one good interface between the control plane and a good swath of different (sofware and hardware) data planes, but it’s not the only one. Similarly, OpenDaylight is working on providing a good way to provide a pluggable control plane with the intention of letting people pick their own trade-offs in the centralized-distributed design space.

summit season

ons2014-lttIt appears as though SDN summit season is upon me. A few weeks back I was at the OpenDaylight Summit celebrating getting the Hydrogen release out the door. This coming week I’m going to be at the Open Networking Summit (ONS) which seems to have become the industry event for SDN in both the positive and negative ways.

If you’re going to be at ONS, you should be able to find me pretty easily. I’ll be at the OpenDaylight Developer Track Monday morning, manning the OpenDaylight booth/helpdesk on Monday and Tuesday evenings, and attending the Research Track most of the rest of the time.

If you’re curious what I do when I have my research hat on you can come see one of my summer interns present “Low-latency Network Monitoring via Oversubscribed Port Mirroring” where we show how to do traffic engineering in less than 4 milliseconds on current hardware and hopefully in 100s of microseconds with a few switch firmware changes. The talk is at 2:30p on Tuesday.

odnl_summitTurning back to the OpenDaylight Summit, I was stunned a how much interest there was with something like 600 people in attendance. I met a ton of people, though mostly only for a few minutes since I spent more time than I would have liked on stage as opposed to talking with people. You can find all the videos from the summit as a YouTube playlist. There’s a ton of good stuff there including demos of some of the projects, plans for future releases, and just general commentary. They’ve also posted most of the slides.

If your curious what I was talking about while I was there, I gave two talks and participated in a panel.

The first talk I gave was on “Consistency Trade-offs for SDN Controllers” where I went over some basics of distributed systems and what their implications were for building clustered SDN controllers. If you’re curious about HA and scale-out, give it a watch. The slides including some material I didn’t get to and references are online as well.

Along with Curt Beckmann of Brocade, I gave a talk on some work I did for the Open Networking Foundation’s Forwarding Abstractions Working Group, which Curt Chairs. I like to call it “Why OpenFlow 1.3+ won’t save us (by itself) and what OpenDaylight can do about it.” If you’re curious about OpenFlow 1.3 and what it means, go ahead and watch the video or look at the slides.

Lastly, I sat on a panel on the future of so called “Northbound Interfaces” to SDN controllers which I think was more interesting than I expected. SDNCentral even wrote an article about it.

on broadband competition in Austin

In theory, Google Fiber is coming to Austin with their 1 Gbps internet for $70/mo. We’ll see if they come where I’m living, but in the meantime everyone else has started to retaliate.

AT&T announced they were going to roll out gigabit internet to their existing U-Verse customers, i.e., their non-DSL customers. Sadly, for whatever reason, my house is just outside the zone where you can get that.

Grande, which is our local, independent ISP, is actually deploying gigabit access now. Everyone should have a local, independent ISP. I had Grande at the house we rented and it was amazing, but sadly, they don’t serve my new house.

So, despite all of this jockeying, it’s not until Time Warner Cable jumps into the fray that it looks like I have any concrete reason to think I’ll get better internet access. It seems like they’re just going to admit they’ve been stingy assholes the last 10 years and actually give us 3x to 10x faster internet at the same price by the fall.

While I’m glad that it seems like Google Fiber is shaking things up in Austin, I’m really dubious that it’s a good idea in the long run. First, it seems like it’s not shaking anything up outside of the few places they’re actually rolling it out. Second, for Google Fiber to come, the city has to agree to waive any regulations about having to serve both rich and poor neighborhoods and give Google Fiber free access to all of the utility poles. The result of that is that in Austin, AT&T—and I think Grande and TWC—have managed to negotiate the same deals essentially eliminating a bunch of the good regulation we actually got.

It’s interesting to follow, I’m glad it seems like I’m going to get 100 Mbps down by 10 Mbps up soon, but I’m not sure what it means long-term for US broadband.

on the genius of James Mickens

For anyone who hasn’t seen James Mickens give a talk, you should find a way to do so. Invite him to your university, lab, office, cave, or dungeon or figure out a conference where he’ll be talking and go. It’s an experience that you don’t want to miss.

In the mean time, I’ve discovered that he’s been writing a series of amazing columns for the USENIX ;login: magazine and they will hold you through until you get a chance to seen him talk.

Go read them now.

featured in the OpenDaylight developer spotlight

This is a bit late (alright, more than two months late), but the Linux Foundation did a little Q&A with me about my role as a developer in OpenDaylight. The key quote I think people should take away is this:

Grab the code and get it to do something first.

A good place to start is getting the installation guide which also walks through getting the simple forwarding application to work. There’s a few moving parts, but the documentation there is pretty good and if you need any help you should jump on #OpenDaylight channel on the freenode server and there’s almost always people willing to help out there.

From there, we have a curated and documented list of small, but very useful tasks that need work along with mentors that are willing to help out. Other than that, hop on the mailing lists and chime in.

Lastly, again, don’t forget the IRC channel. Really. It’s the best way to get fast feedback.

Read the whole thing here, but seriously, come join the fun.

on SDN, network virtualization and the future of networks

To say that SDN has a lot of hype to live up to is a huge understatement. Given the hype, some are saying that SDN can’t deliver, but others—notably Nicira—are saying that network virtualization is what will actually deliver on the promises of SDN. Instead, it appears that network virtualization is the first, and presumably not the best, take at the new way of managing networks where we can finally holistically manage networks with policy and goals separated from the actual devices, be they virtual or physical, that implement them.

Out with SDN; In with Network Virtualization?

In the last few months there has been a huge amount of back and forth about SDN and network virtualization. Really, this has been going on since Nicira was acquired about a year ago and probably before that, but the message seems to have solidified recently. The core message is something like this:

SDN is old and tired; network virtualization is the new hotness.

Network virtualization vs. SDN
Network virtualization vs. SDN

That message—in different, but not substantially less cheeky terms—was more or less exactly the message that Bruce Davie (formerly Cisco, formerly Nicira, now VMware) gave during his talk on networking virtualization at the Open Networking Summit in April. (The talk slides are available there along with a link the video which requires a free registration.)

The talk rubbed me all the wrong ways. It sounded like, “I don’t know what this internal combustion engine can do for you, but these car things, they give you what you really want.” It’s true and there’s a point worth noting there, but the point is not that internal combustion engines (or SDN) are not that interesting.

A 5-year retrospective on SDN

Fortunately, about a month ago, Scott Shenker of UC Berkeley gave an hour-long retrospective on SDN (and OpenFlow) focusing on what they got right and wrong with the benefit of 5 year of hindsight. The talk managed to nail more or less the same set of points that Bruce’s did, but with more nuance. The whole talk is available on YouTube and it should be required watching if you’re at all interested in SDN.

An SDN architecture with network virtualization folded in.
An SDN architecture with network virtualization folded in.

The highest-order bits from Scott’s talk are:

  1. Prior to SDN, we were missing any reasonable kind of abstraction or modularity in the control planes of our networks. Further, identifying this problem and trying to fix it is the biggest contribution of SDN.
  2. Network virtualization is the killer app for SDN and, in fact, it is likely to be more important than SDN and may outlive SDN.
  3. The places they got the original vision of SDN wrong, were where they either misunderstood or failed to fully carry out the abstraction and modularization of the control plane.
  4. Once you account for the places where Scott thinks they got it wrong, you wind up coming to the conclusion that networks should consist of an “edge” implemented entirely in software where the interesting stuff happens and a “core” which is dead simple and merely routes on labels computed at the edge.

This last point is pretty controversial—and I’m not 100% sure that he argues it to my satisfaction in the talk—but I largely agree with it. In fact, I agree with it so much so that I wrote half of my PhD thesis (you can find the paper and video of the talk there) on the topic. I’ll freely admit that I didn’t have the full understanding and background that Scott does as he argues why this is the case, but I sketched out the details on how you’d build this without calling it SDN and even built a (research quality) prototype.

What is network virtualization, really?

Network virtualization isn’t so much about providing a virtual network as much as it is about providing a backward-compatible policy language for network behavior.

Anyway, that’s getting a bit afield of where we started. The thing that Scott doesn’t quite come out and say is that the way he thinks of network virtualization isn’t so much about providing a virtual network as much as it is about providing a backward-compatible policy language for network behavior.

He says that Nicira started off trying to pitch other ideas of how to specify policy, but that they had trouble. Essentially, the clients they talked to said they knew how to manage a legacy network and get the policy right there and any solution that didn’t let them leverage that knowledge was going to face a steep uphill battle.

The end result was that Nicira chose to implement an abstraction of the simplest legacy network possible: a single switch with lots of ports. This makes a lot of sense. If policy is defined in the context of a single switch, changes in the underlying topology don’t affect the policy (it’s the controller’s responsibility to keep the mappings correct) and there’s only one place to look to see the whole policy: the one switch.

The next big problems: High-level policy and composition of SDN apps

Despite this, there’s at least two big things which this model doesn’t address:

  1. In the long run, we probably want a higher-level policy description than a switch configuration even if a single switch configuration is a whole lot better than n different ones. Scott does mention this fact during the Q&A.
  2. While the concept of network virtualization and a network hypervisor (or a network policy language and a network policy compiler) helps with implementing a single network control problem, it doesn’t help with composing different network control programs. This composition is required if we’re really going to be able to pick and choose the best of breed hardware and software components to build our networks.
A 10,000-foot view of Pyretic's goals of an SDN control program built of composable parts.
A 10,000-foot view of Pyretic’s goals of an SDN control program built of composable parts.

Both of these topics are actively being worked on in both the open source community (mainly via OpenDaylight) and in academic research with the Frenetic project probably being the best known and most mature of them. In particular, their recent Pyretic paper and talk took an impressive stab at how you might do this. Like Frenetic before it, they take a domain-specific language approach and assume that all applications (which are really just policy since the language is declarative), are written in that language.

Personally, I’m very interested in how many of the guarantees that the Frenetic/Pyretic approach provide can be provided by using a restricted set of API calls rather than a restricted language which all applications have to be written in. Put another way, could the careful selection of the northbound APIs provided to applications in OpenDaylight enable us to get many—or even all—of the features that these language-based approaches take. I’m not sure, but it’s certainly going to be exciting to find out.

on SDN, network virtualization and jobs

I’ve been thinking a little more recently about how to be disruptive in the networking space and in particular in the data center networking space since that’s where I spend a lot of my intellectual cycles. One thing that we always talk about is reducing costs and in particular reducing CAPEX (capital expenditure) and OPEX (operation expenditure). Generally, there’s more discussion around reducing OPEX than CAPEX because purely software tools for simplifying management and increasing automation can improve OPEX while improving CAPEX typically happens in more complex ways over longer timescales.

However, as this recent Packet Pushers blog post points out:

When companies discuss reductions in OPEX, just remember you are OPEX most, if not all the time. Self-service and automation are great, but if that service is what you provide (and provides your income), you better do something about it. Don’t become roadkill on the path to the future.

This is even a bit more interesting because the disruptive products that we intend to sell are typically sold to IT departments. That is we’re selling the product to the people whose jobs the products most endanger.

There is a bit of a silver lining which is that automation and simpler management also appeal, very strongly, to the very same people. Nobody wants to be doing simple, menial, repetitive tasks all day and tools that cut down on such things tend to be broadly popular.

How do we reconcile these two things? On the one hand we have tools that, if they are successful, clearly make it possible for a smaller number of people to accomplish the same tasks which should reduce the number of total jobs. On the other hand, the people whose jobs are being threatened often embrace these tools. A knee-jerk reaction would be for them to oppose the tools. Why don’t they?

A simple explanation would just be that they’re short sighted and are willing to take the short-term reduction of menial work without worrying about the long-term career jeopardy. That may be a little true, but there’s also a more satisfying answer which I think has more of the truth that the blog post points out:

Virtualisation in the server space didn’t lead to a radical or even a slow loss of roles that I’m aware of; if anything more are required to handle the endless sprawl. Perhaps the same will happen in networking?

Jobs (and entire skill-sets with them) will be lost, but the removal of the pain associated with networking will increase its use. Along with general market growth, this may absorb those affected and history shows we’re all mostly resilient and adaptable to change. Of course, there are always casualties; some can’t or won’t want to change their thinking and their skills in order to reposition themselves.

This resonates with me. It also reminds me of comments that James Hamilton of Amazon AWS fame made during a talk he gave while I was at UW. Essentially, his point was that for every cost reduction they made in AWS, it increased the set of things people wanted to do on AWS more than it decreased profits. In other words, making computing—and networking specifically—more consumable and cheaper will result in there being more computing, not less.

That’s not to say that strictly everyone will be better off, but just that there’s likely to not be some huge collapse in IT and networking jobs as we do a better job of automating things. At least not in the near future.

on out-of-control US surveillance

I really try to be the calm rational one when it comes to accusations of massive surveillance and people saying that the government can “hear, see and read everything,” but this seemingly casual remark seems to be the last in a string of information that points directly to the US government trying, and succeeding in many cases, to record all communication so that they can go back to it later if they so choose.

As the article points out there have been congressmen, NSA employyes and private company (AT&T) employees all trying to blow this whistle. The article doesn’t even mention the massive NSA data center being built in Utah.

I guess it’s time to adjust pur expectations even as we try our best to push for transparency, regulations and change.