First off, we’re sorry about this weekend’s SmartZone outage. We know that email is more and more important to people these days (I have been known to check my email on four different devices simultaneously) and we understand the frustration you felt. We’re truly sorry. Now, the details: SmartZone suffered an outage on Saturday, April 4th, starting at 7:30 am EDT and lasting until 5pm EDT. This outage was caused by a power issue in one of our datacenters that impacted some of our email servers. The outage was compounded by the fact that several databases needed to be restored from back up (that process was successful, and no data was lost which is a good thing). Over the course of the outage, mail that was sent to SmartZone customers was stored in a number of queues across our email system. Once the entire system stabilized we were able to start delivering that queued mail. Our engineers worked around the clock (literally) to identify email that was delayed, and to get it to the proper recipients. This process was completed earlier this morning, and now all email messages queued during the outage have been delivered (one important thing to note, however, is that the email messages will show up in your inbox stamped with the date they were delivered to your inbox, not when they were sent. For example someone might have sent you an email on Saturday at noon, and it was queued over the weekend so it shows up in your inbox as a new email today. If you look at your email headers, though, you’ll see the actual send and received dates). During the outage there was a window of time (from 11am to 2:50pm EDT on Saturday) when our email directory was in a state of flux due to the various databases being rebuilt. There were some small cases (we think a fraction of emails sent) where someone may have sent you an email during these hours and our system was unable to recognize you as a user (because servers were coming back online). In each instance Comcast sent the sender of the email a ‘user not found’ message to alert them that their email message should be resent. I know what you’re thinking: what are you doing to sure this doesn’t happen again? Our engineers are examining every piece of our email system to make sure that we are doing everything in our power to make sure an issue like this doesn’t happen again. We are reviewing all of our systems - hardware, software and processes — to make sure that we mitigate the possibility of this occurring again. Email is a critical means of communication for us, as well as our customers, so we know that it just needs to work.
Update 4/15/09 3:24pm: Lots of folks in the comments are wondering about whether or not our datacenters have redundant power, so I thought it would make sense to add this update. Our datacenters do have fully redundant systems, including power. That said, while we didn't lose power in our data center, an element of our email platform failed (some more details can be found here) resulting in the need to restore data from backup. I hope that answers some people's questions.