smartzone.jpgFirst off, we’re sorry about this weekend’s SmartZone outage. We know

that email is more and more important to people these days (I have been

known to check my email on four different devices simultaneously) and we

understand the frustration you felt. We’re truly sorry.

Now, the details: SmartZone suffered an outage on Saturday, April 4th,

starting at 7:30 am EDT and lasting until 5pm EDT. This outage was caused by a

power issue in one of our datacenters that impacted some of our email

servers. The outage was compounded by the fact that several databases

needed to be restored from back up (that process was successful, and no

data was lost which is a good thing).

Over the course of the outage, mail that was sent to SmartZone customers

was stored in a number of queues across our email system. Once the

entire system stabilized we were able to start delivering that queued

mail. Our engineers worked around the clock (literally) to identify

email that was delayed, and to get it to the proper recipients. This

process was completed earlier this morning, and now all email messages

queued during the outage have been delivered (one important thing to

note, however, is that the email messages will show up in your inbox

stamped with the date they were delivered to your inbox, not when they

were sent. For example someone might have sent you an email on Saturday

at noon, and it was queued over the weekend so it shows up in your inbox

as a new email today. If you look at your email headers, though, you’ll

see the actual send and received dates).

During the outage there was a window of time (from 11am to 2:50pm EDT on

Saturday) when our email directory was in a state of flux due to the

various databases being rebuilt. There were some small cases (we think a

fraction of emails sent) where someone may have sent you an email during

these hours and our system was unable to recognize you as a user

(because servers were coming back online). In each instance Comcast sent

the sender of the email a ‘user not found’ message to alert them that

their email message should be resent.

I know what you’re thinking: what are you doing to sure this doesn’t

happen again? Our engineers are examining every piece of our email

system to make sure that we are doing everything in our power to make

sure an issue like this doesn’t happen again. We are reviewing all of

our systems - hardware, software and processes — to make sure that we

mitigate the possibility of this occurring again. Email is a critical

means of communication for us, as well as our customers, so we know that

it just needs to work.

Update 4/15/09 3:24pm: Lots of folks in the comments are wondering about whether or not our datacenters have redundant power, so I thought it would make sense to add this update. Our datacenters do have fully redundant systems, including power. That said, while we didn't lose power in our data center, an element of our email platform failed (some more details can be found here) resulting in the need to restore data from backup. I hope that answers some people's questions.