Tagged: academia

on statutues of limitations in academic work

I was shooting the breeze with Geoff Challen in Zürich before we headed down to HotOS and we chatted about an idea my advisor and I had chatted about a few years earlier. The idea is that there should be some statute of limitations on how long an old systems paper can cause new papers to be rejected because they aren’t novel. The justification is that after some reasonable period of time—say 10, 15 or 20 years—the assumptions of the paper are probably out of date enough that attacking the problem again even in a similar way is probably interesting to the community.

After the conference, Geoff blogged about the idea in better detail and with a few extra ideas that are worth thinking about.

I just wanted to add a few comments. First, the idea (or at least my interpretation of it) isn’t to ignore older work, in fact, not citing the old paper should probably carry the same penalty that it does today. Instead, the idea is that you should be allowed to publish similar work (as long as you cite it) some period later and not have it hurt you the way it does in the current system.

To some extent I think PCs already do this and will look favorably on a paper which makes use of old techniques as long as it provides some new value as well. So, I’m more calling for an increased shift in that direction than anything else.

post PC meeting SOSP PC member talks

My advisor chaired SOSP this year and asked a bunch of the PC members to give talks about their work during the morning after the meeting. He did a similar thing 3 years ago for SIGCOMM as well and it’s a lot of fun and just stunningly cool to have this number of good talks in a single half-day.

Andrew Myers presented Civitas which is an approach to doing verifiable voting with a bunch of fancy crypto. The two really cool things they did were (1) actually implementing the protocols and pieces of what they propose and (2) providing coercion resistance. It’s cool stuff, but is way complicated. There provide more assurances than any “real world” voting systems presumably because they don’t assume control of the voting location anymore.

Nickolai Zeldovich presented work on how to make security plans explicit to avoid the frustratingly simple security bugs with SQL injection and cross-site scripting attacks. It seemed relatively straightforward and simple if a significant implementation effort, but that’s exactly what we need. The idea is basically to formalize the idea of having export filters for sensitive data and pass those filters along with the data even when you push it out to the file system or the database. They do it with something like 15,000 lines of changes to the PHP interpreter and see about 30% performance overhead for real workloads. Simple idea seemingly done right and actually built.

Peter Chen talked about how he teaches computer systems to freshman with a little programming experience. The course is crazy ambitious and involves designing their own simple microprocessor, building device drivers in assembly, and finally building some kind of music synthesizer (whatever the students want that to mean) all in a 13 week semester class. The demos he gave were really cool, and he claimed that it wound up being a reasonable amount of work for students. About 12 hours/week in 4 person groups. Cool if he’s telling the truth.

John Osterhout talked about why we should build high-performance data stores in the data center/cloud computing world by keeping all data in DRAM. The idea here is that you can cram 64 GB of DRAM “cheaply” in a 1u server and at the same time disks aren’t getting any faster. We should (and already are in many cases) be storing anything for which performance matters entirely in DRAM. There was a lot of good discussion about various things like the energy costs involved, whether the workloads really demand that everything be in RAM, how much of this is already being done and so on. He also has the quote of the day with: “I am the energy anti-Christ.” Single DRAM DIMMs cost something like 10 watts. Its much more energy expensive to store data in ram, but is apparently actually cheaper in terms of operations per second.

Dave Anderson talked about FAWN which is replacing traditional servers with 10-20 little wimpy boards with embedded CPUs and some big piece of flash memory attached to them. Claims (somewhat convincingly) that they get a 10x improvement in energy costs for big data operations. The key point he makes is that there’s a huge imbalance between CPU, storage throughput, memory throughput, etc. and it’s easier to bring these into balance (and thus become more power efficient) with wimpier nodes. Very cool stuff made surprisingly real, though still mostly proof of concept. People commented at the end that this is really a solution to pin bandwidth and bus latency and happens to help with power.

Mothy Roscoe presented and interesting vision of the future of cloud computing where we talk about what a personal “computer” will look like as we go forward. He defines computer (somewhat aptly) in as the combination our data and what we do with it, so this likely spans you phone, laptop, home server as well as some VMs in the cloud and possibly services. What they actually built is Rhizoma which basically runs as a “sidecar” to what you really want and acquires resources for you based on a constrained-logic program policy that you express about the kinds of resources you want and the value you place on them. Think of it as the brain of the cloud which intelligently moves your stuff around as the situation changes. Cool idea, some preliminary data about it moving around PlanetLab as load and network conditions changed gives an idea that this might do something like reasonable.

Steve Hand talked about his experience with the start up around Xen and basically told the narrative of how they got there and what they did. The story was interesting, but I’m a lot less able to condense it down, so instead I’ve mostly just transcribed some of his notes along with his 8 lessons in the slides.

  1. Play to your strengths. Hire where you know what’s going on and where you have contacts, connections, etc. Don’t let VCs, CEOs, CTOs throw you into growing too quickly in ways that aren’t going to obviously help.
  2. Ambition can be deadly. Took 1 year to release a buggy, over ambitious, so-so product.
  3. Vet your co-founders. Abrasive, megalomaniacal, bullying people will cause you problems. Not-so-smart people hire people who are less smart than they are and so on. Eventually VCs helped to push out the CEO and CTO.
  4. Smart business folk exist. There are very good business people out there, but not very many of them.
  5. OSS is (mostly) good. Viral marketing/coverage saves money, gets you mind-share. The fact that other people can (and do) take your code, this means you need to (and are driven) to maintain your edge. Problem with weak offerings diluting brand needs to be dealt with.
  6. People are everything. Hire the best people you can, they are your entire capital especially when you’re open source, not patents. Both engineers and management.
  7. Ship products early and often. One in ten start ups actually ever ship a product of any kind. First release was 21 months after founding of company and they sold only 5 copies. Aimed for 3 month release cycle: 4 weeks dev, 4 weeks test, 4 weeks package/ship. Helped a lot!
  8. Arranging a marriage. Needed to find somebody who didn’t offend anyone they had to work with. Citrix wound up being a good choice.

has the academic computer systems community lost it's way?

I just got into a debate with a few friends about whether or not the current academic computer systems community is still relevant or if we’ve lost our way and are now not productively matching out efforts with how to have impact.

My friends—who I should say are very smart people—argue that somewhere between when academics produced nearly all computer systems (including UNIX, the Internet, databases and basically everything else) and now, we have lost our way. If you look at recent kinds of systems—for example: peer-to-peer file-sharing, modern file systems, modern operating systems, modern databases and modern networks—many of the ones in common use are not those developed by academics.

Instead we’ve seen Napster, Gnutella, BitTorrent, and DirectConnect in peer-to-peer; ext3 and ReiserFS in file systems; Windows and Mac OS in operating systems; MySQL and SQL Server in databases, and the list goes on.

One conclusion from this is to make the observation that the systems we build—be they wireless networking technologies, new internet routing algorithms, new congestion control algorithms, new operating systems, new databases, or new peer-to-peer systems—simply aren’t being used and that this is out of alignment with how we view ourselves. The natural next step is that we should change our approach to acknowledge that we aren’t building systems which people are going to use or to figure out why what we’re building isn’t used.

For a variety of reasons, I think that this conclusion is wrong. First, you don’t have to look far to find recent academic systems which are in widespread use. VMWare and Xen both came direction from academic work on virtualization. Google (or more precisely page rank) started as an academic system. The list goes on, and this doesn’t count the fact that many systems not directly built by academics are heavily influenced by academic research. The ext3 file system is just ext2 with additions based the log-structured file systems paper by Mendel Rosenblum (who is now a co-founder of VMWare). Linux isn’t an academic venture, but it’s essentially a complete reimplementation of UNIX which was.

In the areas where academics appear to be “outperformed” there are some very reasonable explanations. In areas like databases and wireless networking you are looking at the impact of a few grad students and a few million dollars on multi-billion dollar industries employing thousands of people. The fact that we have any impact at all is impressive.

In areas like peer-to-peer file-sharing, most innovation has been driving not by technical needs, but by legal needs and making systems hard to shut down. While this is now someplace that academics find interesting, to have expected academics to do research into how to circumvent legal machinations seems a bit out of whack.

In the end, I feel like I am more able to contribute to the real world from academia than I would be elsewhere. There is a certain level of respect for ideas and tools produced by academics which is hard to garner elsewhere, there are fewer constraints caused by market pressures, and teams of people are smaller and more agile.