US ISP CenturyLink Misconfiguration Creates Havoc On Global Web Traffic

CenturyLink incident takes down Cloudflare, Reddit, Hulu, AWS, Blizzard, Steam, Xbox Live, Discord, and dozens more.

US internet service provider CenturyLink has suffered a major technical outage on Sunday after a misconfiguration in one of its data centers created havoc all over the internet.

Due to the technical nature of the outage — involving both firewall and BGP routing — the error spread outward from CenturyLink’s network and also impacted other internet service providers, ending up causing connectivity problems for many more other companies.

interesting reading:  China Looking To Lead Global UAS Sector

The list of tech giants who had services go down today because of the CenturyLink outage includes big names like Amazon, Twitter, Microsoft (Xbox Live), EA, Blizzard, Steam, Discord, Reddit, Hulu, Duo Security, Imperva, NameCheap, OpenDNS, and many more.

Cloudflare, which was also severely impacted today, said CenturyLink’s outward-propagating issue led to a 3.5% drop in global internet traffic, which would make this one of the biggest internet outages ever recorded.

Root cause: Misconfigured Flowspec rule

According to a CenturyLink status page, the issue originated from CenturyLink’s data center in Mississauga, a city near Ontario, Canada.

The telco says the root cause of the incident was an incorrect Flowspec announcement.

Flowspec is an extension for the BGP protocol that allows companies to use BGP routes to distribute firewall rules across their network. Flowspec announcements are usually used when dealing with security incidents, such as BGP hijacks or DDoS attacks, as it allows companies to change their entire network to react and mitigate attacks within seconds.

interesting reading:  Harvard Uni Performs Review Study On Link Between Air Pollution And Covid-19

However, today, CenturyLink said that its Mississauga data center sent out an incorrect Flowspec announcement that effectively prevented the company’s BGP routes from taking root.

Cloudflare, which observed the incident from afar, believes CenturyLink effectively but its entire network into a loop by announcing a brand new set of BGP routes and then accidentally dropping all routes via the misconfigured Flowspec rule.

BGP routes are the glue that keeps the internet up. They are a type of message that internet companies relay between each other. BGP routes tell each internet provider which chunk of IP addresses is available on its network.

However, as CenturyLink’s incorrect Flowspec command brought down some of the routers inside its network, some of those routers also began to announce incorrect BGP routes to other “Tier 1” neighboring internet service.

interesting reading:  Putin declares state of emergency after major oil spill in Arctic Circle

This, in turn, brought down other networks in a domino-like effect.

Outage took seven hours to fix

CenturyLink fixed the issue by taking the rare step of telling all other Tier 1 internet providers to de-peer, and ignore any traffic coming from its network. Companies rarely take these kinds of decisions, as this results in full connectivity loss for all its customers.

All in all, CenturyLink had to reset all equipment and start with clean BGP routing tables, a process that took almost seven hours to complete, from around 12:13 UTC to 18:58 UTC, the company said.

“This was a significant global Internet outage,” said Matthew Prince, co-founder & CEO of Cloudflare, in his analysis of the outage.

The article is originally published at ZDNet.

Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha loading...

blank