COVID-19 Network Report: How A Smart Network Delivered Speed and Stability When it Mattered
(This is the third in a series of stories exploring how our network thrived during the COVID-19 surge. We’re sharing the key factors that allowed us to deliver fast, reliable service to millions of Americans during history’s largest sustained surge in residential Internet traffic, and how we’re making sure we’re ready for what comes next. You can read more here.)
To manage an unprecedented event like the COVID-19 Internet traffic surge, it wasn’t enough for our network to have sufficient capacity and reliability…it also had to be smart.
When you think of the Internet, you might think of massive fiber optic cable crisscrossing the globe, but the Internet is really a constantly evolving platform incorporating the latest technological innovations. Over the past decade, we’ve built a suite of software, machine learning and artificial intelligence tools that optimizes network performance; allowing us to dynamically add capacity; and automatically mitigate problems that may emerge, even before they’re experienced by customers.
During the COVID-19 traffic surge, these technologies played critical roles in ensuring that we were able to deliver fast, continuous service from coast-to-coast, including in the hardest hit areas, where Internet traffic spiked as much as 60 percent in some markets.
These technologies were born out of need and a recognition that networks could not efficiently meet the complex and dynamic demands facing them by simply getting bigger. While it’s important that we have built an expansive, fiber-dense network to carry the volume of traffic that we handle, it’s also essential that we continually invest in advanced software-based tools to manage the complexity of modern traffic demand.
These technologies work behind the scenes and are at their best when our customers don’t notice them, and they played a valuable role over the past several months. Here are some examples of how they work every day, and how they helped our network thrive during COVID-19:
Comcast Octave is an AI-based Platform, developed by Comcast engineers in Philadelphia that checks 4,000+ telemetry data points (such as external network “noise”, power levels, and other technical issues that can add up to a big impact on performance) on tens of millions of modems across our network every 20 minutes.
Octave is programmed to detect when modems aren’t using all the bandwidth available to them as efficiently as possible and automatically adjust them, delivering substantial increases in speed and capacity. Octave is a new technology, so when COVID-19 hit, we had only rolled it out to part of our network. Knowing that it could be key to delivering additional capacity when our customers need it most, a team of about 25 engineers worked seven-day weeks to reduce the deployment process from months to weeks. As a result, we were able to deliver a 36 percent increase in capacity …just at the time that customers needed more bandwidth than ever before for working, streaming and videoconferencing.
Before we introduced Octave, we deployed our Smart Network Platform. Developed by Comcast engineers, the Smart Network Platform is a suite of software tools that automates many of our core network functions. As a result of this investment, we’ve been able to dramatically cut down the number and duration of outages our customers experience.
One of these tools – called NetIQ – uses machine learning to scan our core network continuously, making thousands of measurements every hour. Before NetIQ, we would often find out about a service-impacting issue like a fiber cut when we started seeing service degradation or getting customer calls. With NetIQ in place, we can see an outage instantaneously. We’ve reduced the average amount of time it takes to detect a potentially service-impacting issue on the core network from 90 minutes down to less than five. During COVID-19, NetIQ and the other elements of our Smart Network platform played a key role in keeping us up and running under tremendous demand.
We are still early in the process, but the investment we’ve made in creating a more virtualized, cloud-based network architecture has also played an important role in allowing us to address accelerating demand and at the same time deliver a faster, more reliable service. Virtualization means taking functions that were once performed by large, purpose-built pieces of hardware – hardware that required manual upgrades to deliver innovation – and moving them into the cloud.
By doing this, we reduce the innovation cycles on those functions from years down to months. One example of this is our virtual CMTS initiative. A CMTS is a large piece of physical hardware that serves an entire neighborhood, delivering traffic between our core network and the homes we serve. Increasingly, we’ve been making those devices “virtual” by transitioning their functions into software that runs in our data centers. This not only allows us to innovate at a much faster pace, it also provides two key benefits for customers. First, it allows us to introduce much smaller “failure points” into the system, grouping customers into smaller groups so that if one part of the network environment experiences an issue, it affects far fewer people. Second, and most importantly, the virtual architecture lets us leverage our other AI tools to have far greater visibility into the health of the network, and to even “self-heal” issues without human intervention.
In a network environment that grows more complex and challenging with each passing day, these smart technologies are our secret weapon to ensure that we can continue to give customers what they want, even when we are hit with the unexpected.
Elad Nafshi is Senior Vice President, Next Generation Access Networks.