System resiliency at Facebook

In Fail at Scale, Ben Maurer (tech lead on Facebook’s Web Foundation team and formerly of reCAPTCHA fame), relates lessons about dealing with software failures at Facebook. If you are building distributed systems, particularly very large ones, you have to also think about failure and resiliency in the face of it.

Apart from that, Facebook’s engineering environment requires that lots of people have to be able to commit and release changes, often – and be fearless about it.

From what I can tell this really comes down to two major ingredients:

  • Smart engineering to achieve high levels of resiliency to proactively handle potentially disruptive situations.
  • A culture that prioritizes and embraces continuous learning and productive problem solving.

The presentation below covers the paper as well as additional relevant bits.


Considering emotions

How do you feel after spending 30 minutes idly browsing social media or mainstream news websites? Frankly, I would venture to say that my happiest and most productive days tend to be those, when I avoid that activity altogether. Mostly, that is about the cognitive noise. This article is not about that.

In a fascinating (and controversial) study, researchers at Facebook and Cornell University showed that the content of users’ newsfeeds may affect their expressed emotions – and possibly indirectly their felt emotions as well.

Discussing this in When technologies manipulate our emotions, the authors pose a thought-provoking question:

Can design ever be emotionally neutral and if not, on what criteria should technologists base design decisions?

It seems inevitable: To the extent that we let our mental world be affected by interactions and observations in the outer world, we would often be challenged to not let products we use or processes we go through impact our emotions.

When creating a product, we are rightfully concerned with making it effective, usable. People using our solution should not only be able to carry out their tasks as they intend, they should be able to do so without confusion or worse yet fear of somehow getting it wrong. Beyond just ensuring that something works, and ideally very effectively, I think it is also generally instructive to pay attention to how a user’s emotions are affected by the product and the process of using it. Does the experience seem positive, enriching – or does it have a detrimental effect? How does it feel?

My two reasons to look at Comcast’s website

There are only two reasons, why I visit Comcast’s website:

  1. pay my monthly Internet bill
  2. check whether there is a service outage in my area.

So, if things are going really well, then I go there once a month to perform a single task. Here is a recent snapshot of their homepage, which illustrates an interesting disconnect.


Given hundred of links, guessing is fair game. Hint: their documentation reveals information about outages, though you may be as successful just searching Twitter.