on the end of browser choice


It’s always nice to see SciAm cover—or at least try to cover—CS/engineering topics in addition to their staples of physics, chemistry and biology. In this case, they raise an interesting point. With the rise of mobile, we’re no longer in control over which browser we use a lot of the time.

  • On iOS devices, Apple permits only its own version of the WebKit browser engine. Technically other browsers besides Safari are allowed, but they must use Apple’s technology for actually rendering Web pages.
  • Microsoft wants a similar approach on Windows RT, the version of Microsoft’s storied operating system for devices using low-power, mobile-friendly ARM processors.
  • On Windows Phone 7, Internet Explorer is built in, but other browsers don’t get its privileges.
  • On Google’s Chrome OS, the browser is the operating system. Linux lurks beneath, but all the applications run on the browser, and that browser is Chrome.
  • Mozilla’s Boot to Gecko (B2G) project takes a similar approach as Chrome OS, only for mobile phones. Mozilla naturally uses the Gecko browser engine that’s the foundation of Firefox.

Really, this somewhat misses the point. What’s actually going on is that as we shift to mobile devices, for whatever reason we seem to be giving up the notion of what a general purpose computer should be and the fact that some regulations are needed to defend users against monopolies.

This is really a smaller part of the war on general purpose computation. Though from the monopolist perspective, rather than the oppressive government perspective. Still, this isn’t going anywhere good any time soon.

in defense of ‘heavy’ programming languages

I have a lot of friends who do the vast majority of their programming in new-style languages like python and ruby and one of the things is that I get the occasionally talks tossed back my way via blogs, twitter or whatever. Sometimes they’re interesting, but other times, they say things which I largely disagree with.

One of the refrains I hear quite a bit is that old-style languages (really mostly Java and C++) are too verbose. That you have to type things like this all over the place:

 ArrayList<Integer> intArr = new ArrayList<Integer>();

The claim is that you’re being annoyed because (1) you’ve had to type the exact same thing twice and (2) because you’ve been forced to specify the type in very specific terms when it’s not necessary.

While on the face of it, both these things are true, it somewhat misses the point. Not all verbosity is just annoyance. A lot of it has value because it forces you to type what you think you mean several times. If you do something different one of those times, maybe it’s a typo, but a surprising amount of the time—at least for me—it’s a bug in the logic in my head.

The fact that these languages give you ways to specify what you mean multiple times and then check them against each other for you isn’t a bug, it’s a feature. They type checker is your friend, strong types are your friend, a bit of redundancy in specification (especially when a decent IDE helps manage it) can be your friends. They all help turn hard-to-find bugs in your logic into easy-to-find bugs that a compiler can find for you.

Sure this isn’t true infinitely and there are times when you really are doing something stupidly simple and could do without typing the same long type specifier twice, but I find that more often, I’m doing something which is mildly complicated, involves code spanning  few different files and it’s very helpful to have a bit of redundancy tell me when I’ve screwed something up rather than merely assuming I’m obviously god’s own coding ninja and thus knew exactly what was doing at every point in time.

Another minor point came up in a PyCon video a friend posted. You can find it here. It spends 30 minutes arguing two points:

  1. You shouldn’t use a special class if (a) a simple function would do or (b) if one of the base classes would do.
  2. You shouldn’t use your own errors or exceptions which is really a special case of 1(b).

The core point seems to be that your own stuff is likely to be hard for others (or even you later on) to read or understand while others already understand the core/base classes. In other words, avoid unnecessary layers of indirection.

Again, while this is probably true in some cases, and likely quite true in the bite-size examples that fit nicely in a 30 minute talk slot, it misses a lot of the reasons why people do it. Hint: It’s not because we’ve all been brainwashed by Java and CS curricula and claiming so doesn’t help you make any of your points.

As one example, I use “empty” classes which either wrap or simply subclass an existing base class a lot as a way of leveraging the type system to let me know what “kind” of set or what “kind” of long I might have. For instance in code I’m writing right now, we use a long identifier to refer to both switches and hosts in a network, but it’s useful for me to know which is which, so I have two classes which are both basically just a Long in Java. It means that I can make functions which only take one or the other and give me an error when I do something I didn’t mean to though. Very useful way of turning logic bugs in my head into something the compiler can check.

In any event, I think we would do well to at least cast old-style and new-style languages a two points on a spectrum with advantages and disadvantages rather than simply looking at the new-style languages as having vanquished the obvious stupidities of the old-style ones.

brainput literally makes a computer your outboard brain


Using functional near-infrared spectroscopy (fNIRS), which is basically a portable, poor man’s version of fMRI, Brainput measures the activity of your brain. This data is analyzed, and if Brainput detects that you’re multitasking, the software kicks in and helps you out. In the case of the Brainput research paper, Solovey and her team set up a maze with two remotely controlled robots. The operator, equipped with fNIRS headgear, has to navigate both robots through the maze simultaneously, constantly switching back and forth between them. When Brainput detects that the driver is multitasking, it tells the robots to use their own sensors to help with navigation. Overall, with Brainput turned on, operator performance improved — and yet they didn’t generally notice that the robots were partially autonomous.

That is pretty damn cool. The idea that something can notice that I’m stressed, overtaxed or just generally not making the best decisions that I could be, then take over and maybe not do as well as I could at my best, but do a passable job rings really true to me.

on HomeOS

A few Fridays ago, I gave a talk on the work I did building HomeOS at NSDI in San Jose. If you’re interested, you can find the paper and slides on on my website’s list of publications as well as a video of the talk on the conference website.

Somewhat predictably, because the work was done with Microsoft Research the tech media has been picking up the presentation and reporting on it speculating on what Microsoft might be up to in the long-term. The fact that as part of the project we actually got the system running in 12 real homes only added to the temptation. It’s been covered on CNET, Slashdot, GigaOm, TIME, The Verge and engadget among others.

Fun times, working on the project was a blast and it’s even more fun to see that other people actually care.

on Apple in the post-device world

Apple has had their Icarus moment and it’s not losing Steve Jobs—though that may also prove problematic. Their Icarus moment is the inability to actually deliver on the promises of iCloud. They are now, and have always been, a device company and they are about to enter the post-device world. Try as they might, they can’t seem to execute on a strategy that puts the device second to anything else.

Let’s step back for a minute and think about where technology is heading in the next 5-10 years. It hasn’t even been 5 years since the iPhone came out and effectively launched the smartphone and in the process started us down the path to the post-PC world. We’re pretty much there at this point, but it doesn’t end there.

The next logical step is the post-device world where the actual device you use to access your data and apps is mostly irrelevant. It’s just a screen and input mechanism. Sure, things will have to be customized to fit screens of different sizes and input mechanisms willy vary, but basically all devices will be thin clients. They’ll reach out to touch (and maybe cache) your data in the cloud and any heavy computational lifting will be done somewhere else (as is already done with voice-to-text today).

The device you use more or less will not matter. As long as it has a halfway-decent display, a not shit keyboard, some cheap flash storage for a cache of some data, the barest minimum of a CPU and a wireless NIC, you’re good.

This world is not Apple’s forté. Not only is is nearly all of their profit from exactly the devices that will not matter, but they’re not very good at the seamless syncing between devices either. It took them until iOS 5 to provide native syncing of contacts, calendars and the like directly to and from the cloud. After Android, Palm’s webOS and even comically-late-to-the-party Windows Phone had implemented it.

Moreover, this is not the first time Apple has tried to provide some kind of cloud service. They started with iTools in 2000, then .Mac in 2002, MobileMe in 2008, iWork.com in 2009 and now they’re on iCloud. None of the previous incarnations have been what anyone would call a resounding success. In at least one case, it was bad enough that Steve Jobs asked “So why the fuck doesn’t it do that?”

So, who will succeed well in this post-device world? The obvious answer might be Google since they’re already more or less there by having all of their apps be browser-based, but I’m not totally convinced. They seem to be struggling to provide uniform interfaces to their apps across devices and that seems hey here. For instance, the iconography of my gmail is different from my browser than it is on my Android tablet and that’s for a device they own.

Actually, in a perverse way, I think Microsoft might really have what it takes to succeed in this world if they can execute. They have a long history of managing to maintain similar interfaces and design languages across different platforms and devices. Though, their failure to provide a clean Metro-based interface in Windows 8 is a bit of a damper for their chances.

on google voice


Provides a somewhat interesting look into what Google wants out of Google voice. I think it’s really only interesting because traditionally phone calls have been much more protected than other types of communication. Otherwise, Google has tons of different products which don’t seem to make sense from any traditional “how does this make money” sense.

Still, it does beg the question of whether I’m giving up any protections by using Google voice rather than just using a normal phone line.

do app stores further encourage winner-take-all tendencies

A while back I was trying to find the best PDF reader for my Android tablet that would allow me to review papers including putting annotations on the PDFs and syncing the documents to and from either my computer or a reasonable cloud service.

What came out was a bit surprising. There were many different readers, ranging from free to $10, but by far the most useful one from my point of view (ezPDF) was only $3 and this got me thinking. App stores further reinforce winner-take-all tendencies because if one application has 3x the users of another, they can afford to price it at 1/3 the price. (Obviously, assuming that the development efforts cost similar amounts.)

To a degree, this inhibits innovation since a new application will have a much smaller install base. This is different from how other software was commonly sold where prices were more or less fixed somewhere between $30 and $50 with no real “discount” for popular apps.

Still, it’s not clear if this inhibition is stronger or weaker than the generally decreased barrier to entry which the app store model provides to new applications both by breaking up functionality into smaller pieces that and by providing a clear mechanism to discover competitors.

It will be interesting to see how it all plays out and also to see how software pricing evolves in this model.

James Hamilton on cloud computing

One of the hazards of my career as a systems researcher is that I sit through a lot of talks. Many of them are great—some are not great, but fewer than you might think—and taking notes during the talk helps me remember what I’ve heard. Actually putting those notes into some sane state helps even more, so I’ve resolved to try to post a summary and/or notes from at least some of the talks I see.

These notes are long overdue from a talk James Hamilton gave about cloud computing at Amazon. You can find the original talk (with links to video) on the UW CSE Colloquia archives.

Unfortunately the talk was so back-and-forth between topics and packed pretty densely with information that I really didn’t know how to summarize it other than a giant nest of bulleted lists. (It’s also kind of appropriate since it’s often how he summarized talks on his blog as well.)

  • Massive Innovation in Storage Today (akin to 20-25 years ago)
    • more relational db than ever before
    • more nosql db than ever before
    • distributed storage big
    • all happening in the cloud
  • DB/Storage world
    • one-size does not fit all
    • 30-year-old assumption are no longer valid
    • computing becoming absurdly cheap
      • as the cost goes down, the number of things it makes sense to do with computers goes up
    • cloud computing is different (and real and going to change things)
  • Scale!
    • AWS is now bringing in the same amount of compute that ran Amazon.com in 2000, every day!
    • Scale is great! means that small gains still matter when multiplied by big numbers
  • Where does money go?
    • Chart updated every few years because
      • (1) you want to work on the right problems and
      • (2) most data out there is garbage
    • People aren’t here
      • You *must* automate, so it’s not relevant
      • People get things wrong too
        • smart people doing boring things => mistakes
        • smart people doing cool, hard things => good stuff
    • servers are 57%, power distribution cooling 18%, space is 4%,
      • look up more details
      • ratio which is networking gear is going up (he’s spending half his time on networking)
  • Limits to Computation
    • what limits apps on infinite cores?
      • parallelism (we’ll ignore this)
      • getting data to/from cores (which really becomes power)
      • power: cost rising and will dominate
    • storage & memory b/w is lagging cpu
      • memory and disk walls
    • memory wall
      • short term: more lanes to memory (uses too much power in long-term)
      • long term: chip stacking with through silicon vias (much less power, but more heat)
        • used in cell phones today, looking good in lab at high performance
      • will likely make the memory wall less scary than you might think
    • disk wall
      • density will keep scaling and grow for 10-15 years
      • rotational speed looking less good => bandwidth and latency won’t improve
        • >15k rpm is not economically viable, too much power at higher rpm
        • both increases power to rotate, and number of chips to pull data out
        • predicts death of 15k rather than rise of 20k
        • amazon does not buy 15k rpm disks, where you need speed buy nand flash
      • disk becomes tape
        • sequential I/O is a problem, but not nearly as a bad as random I/O perf
        • 3TB disk takes 8 hours to read sequentially, takes a month to read randomly
        • “you cannot afford to randomly seek ever” is almost true
      • where this matters, it’s all memory and nand flash
  • Sea change in networking
    • networking gear doesn’t seem to be obeying moore’s law of cost
    • this is broken! why? monolithic, non-commodity => expensive, not innovative
    • changing, you can now buy raw ASICs
      • marvel, intel, broadcom, etc.
    • can now make open source, open platform
    • software defined networking
      • centralized control plane, commodity software
      • centralized is better if everyone is in the same administrative domain
    • openflow is uninteresting except that it has real force behind it and it seems to be unifying
    • client side
      • virtualized NICs in h/w SR-IOV
      • infiniband world to get from apps straight to h/w
  • MapReduce
    • reaction to RDBMs don’t scale, this does
    • one of two things that normal people can write and parallelizes (SQL & MapReduce)
  • NoSQL
    • another reaction to RDBMs don’t scale
    • key-value at scale is something we’re willing to give up some things
    • aside: CAP-theorem should be taken literally, but not used as an excuse to not do things
    • he argues that eventual consistency is hard to reason about and unnecessary
      • we can build consistency that’s as fast
    • he argues sliding-knob for consistency.
      • really!? does it have more than 2 settings? –Colin (see below)
  • Client Storage Migrates to Cloud
    • local silicon cache with disk in the cloud
  • Open Source & Cloud Influence
    • dirt cheap, commoditized computing when looked at overall
    • makes a lot things possible that weren’t possible before
  • Summary
    • cloud + scale => increasing rate of innovation
    • costs down => compute up
    • networking upset
    • difficult problems are all related to persistent state


  • Does the consistency know need more than 2 settings?
    • There are some more than 2 settings for consistency (i.e., how to behave in partitions), but 2 covers a lot
  • Is nand flash in front of disk as a cache a good idea?
    • yes, but dedup and compression make more stuff fit in caches
    • still need to put *ice* cold data on disk. most data is write only, perfect use for disks!
  • Reusing heat for useful things
    • challenge, it’s low-grade heat by heating standards
    • a few useful things
      • if data center is downtown, it totally makes sense to heat locally, transport is inefficient
      • if out in the country in a cold area, then you can use heat to grow high-value crops for longer
        • not clear
    • very hard to extract power from this heat

cory doctorow on the war on general computation

I’m not really a huge Cory Doctorow fanboy, but he does tend to say things well. Here, he talks a lot less about copyright/copyleft (though still some) and a lot more about how society is going to deal with and or not deal with general purpose computation as it (even more than now) becomes part of everything.

Really, he just makes two points:

  1. General purpose computation and general purpose networking are going to be everywhere and there will be huge (seemingly reasonable) demand to restrict it.
  2. All ways to restrict it converge spyware and rootkits to restrict general purpose computation and surveillance and censorship to restrict general purpose networking.

That being said, what he doesn’t do, and what I wish he would do, is articulate an argument for why somebody who would today want to do SOPA-like things (and there are lots of such people) should think twice about doing it in a SOPA-style way.

Still, it’s worth watching because it’s an entertaining and clean explanation of those two points.

on new science and research

Science and Research; Then and Now

There’s been a lot of talk about the need for disruption in the way that science and research are done. Traditionally, a small group of people (or even just one person) would work for a while on something on their own and then publish it in a conference or journal when it was somehow ‘complete’. This publishing was done mostly on paper.

The modern era has brought a huge amount of technology that could improve this process, but in practice all it seems to have done is make it possible to download a PDF rather than having to find the actual physical paper. In fact, crappy policies by conferences, journals and professional organizations have actually made even this advance inaccessible in many cases. (Matt Welsh has a great blog post about Research Without Walls which calls on researchers to not agree to submit or review work that will not be publicly available online.)

Further pointing to the fact that we’re not taking enough advantage of technology is the success of the polymath projects in leveraging a distributed and open group of people to solve hard math problems by letting them easily collaborate on ideas for making progress. Certainly, this shows we can do better. People are solving hard math problems with global-scale collaboration and we’re still having arguments about whether it’s OK for organizations to be able to hide publicly-funded research behind paywalls.


Michael Nielsen has a great piece in the Wall Street Journal (which I think should be open to anyone) that goes into some details and also describes some of the hurdles that such a transition might face. The core message is that the incentive system we have now is broken and doesn’t encourage people to share their results, their data or generally collaborate very well. Instead, all it incentivizes people to do is to produce publications which in turn help their reputation and likelihood of getting funding.

His point seems right on, but managed to come across with a tone that makes me want to dig in deeper and ignore his advice. Sentences like “There are other ways in which scientists are still backward in using online tools,” simply make me want to scream. It’s not like scientists are actively trying to hide in a hole. We’re doing the best that we can with limited time to figure things out and responding to the reward structures we have.

Worse still, his tone is easy to pick up on and blogs that focus less on research are able to nearly parrot it back. This GigaOM post is a good example. It winds up pulling the same strings in me as the calls for the general public and congress to review what science is done in this country.

That being said, I agree. We’ve made crap for progress even in my field of computer science. I think that we do better than most with there being viable venues for people to present their work every 6 months or so (more often in areas like databases where VLDB now has monthly submission deadlines). Further, it winds up being something like 6 months between submission and presentation. That means even if it’s accepted right off the bat, it can take a year from ‘finishing’ something until it’s presented.

At the same time we still struggle with actually ‘publishing’ results in the good sense with data and open access to the papers. USENIX is amazing and lets you freely distribute your work, posts it online for free and even posts the videos of presentations online for free. Other organizations—ahem, ACM and IEEE, ahem—have been less forward-thinking.

That’s just dealing with bringing the old model of publication into the new area where faster publication schedules and wider dissemination are possible. It doesn’t do anything to address new forms of collaboration.


Sadly, while Nielsen does go on to explain some easy fixes that will at least aim to provide open access to data and papers, namely mandating it as part of grant approval, he offers very little to address getting to the real-time, cross-group collaboration which he starts talking about with the polymath project.

Really, all he has to offer is this:

Grant agencies also should do more to encourage scientists to submit new kinds of evidence of their impact in their fields—not just papers!—as part of their applications for funding.

The scientific community itself needs to have an energetic, ongoing conversation about the value of these new tools. We have to overthrow the idea that it’s a diversion from “real” work when scientists conduct high-quality research in the open. Publicly funded science should be open science.

I think it misses the broader issue which is that it’s hard to be a scientist or researcher today and the result is that we instinctively cling to any edge we have. The consistent cutting of higher education’s budgets and similar cuts in industry, Intel radically cut it’s collection of research labs in the last 2 years, have left far fewer true research positions available. When resources are scarce, it’s hard to convince people to share.

Sometimes we don’t share data because we think we can benefit from it if we hoard it. Other times we don’t collaborate because we worry about how credit will be divvied up. It’s entirely possible that these are actually non-issues. In fact, I think this is likely the case. Nonetheless, right now it’s dangerous to step out into this world since we don’t know the answers.

However, I think most of the time we don’t collaborate or share, it’s not out of malice or selfishness, but rather that it’s extra effort to share. I think that Nielsen underestimates the difficulty in releasing curated code or data sets. It’s not just a matter of posting a file to a web server. Even more so, researchers see little or no immediate benefit from such sharing.

Perhaps a first step would be creating a way for researchers to share some details of what they’re doing in exchange for people providing feedback and suggestions publicly. This model is already used to some degree when grad students and faculty give talks about their work in progress to closed audiences. Even so, I think it’s done too little and too late.

My Conclusions

In the end, I think that the goal of open access seems like something that’s relatively easy to obtain and will at least modernize the traditional publishing mode. The more ambitious goal of broad collaboration to make progress more quickly is tantalizing, but I think merely yelling at scientist to believe in it and do it is the wrong way to get there.