Fast.ly broke the Internet for an hour this morning
SPOF FTL —
Every redundant system still has single points of failure—usually human.
For roughly an hour this morning—6 am to 7 am EDT, give or take a few minutes—enormous swathes of the Internet were down or interestingly broken. Sites taken down included CNN, The Guardian, The New York Times, PayPal, and Spotify, among many more—including The Verge, which resorted to reporting via Google Docs during the duration of the outage.
Vast chunks of the internet are offline, including The Verge. Until we’re back, we’re reporting to you live out of Google Docs. Here’s what we know so far about the outage: https://t.co/4b1p2qhYif
— The Verge (@verge) June 8, 2021
The underlying problem was an outage at Fastly, one of the world’s largest Content Delivery Network providers—the entire service went down due to a misconfiguration that it had deployed to all of its Points Of Presence (POPs) globally. As a result, sites using Fastly for content delivery came up with various errors dependent on the local site configuration. Some sites delivered relatively uninformative plain HTTP 503 (Service Unavailable) pages, while others returned errors such as “Fastly error: unknown domain.”
The “unknown domain” error gives us some tantalizing hints to the nature of the problem, which is more than Fastly’s own status updates have so far. This tells us that Fastly’s network was up and its Varnish cache servers were answering requests, but its cache configuration—the Varnish Control Language files that point the cache server to the back-end servers supplying the original content—was almost certainly either missing or garbled.
Fastly’s own status page acknowledged the issue at 5:58 am EDT, then declared it identified at 6:44 am and fixed (with “increased origin load,” which in many cases may effectively have meant sites were still unavailable for a while) at 6:57 am. The status page classified the issue as a “global CDN disruption” but gave no technical details. A tweet from the Fastly engineering team gave slightly more detail:
We identified a service configuration that triggered disruptions across our POPs globally and have disabled that configuration. Our global network is coming back online. Continued status is available at https://t.co/RIQWX0LWwl
— Fastly (@fastly) June 8, 2021
Although the outage was mercifully brief—Director of Internet Analysis Doug Madory told CNN that his firm Kentik saw Fastly traffic disappear at 5:49 am and begin reappearing at 6:39 am—the financial impact can be enormous. Media measurement firm Kantar guesstimated $29 million dollars in ad revenue lost worldwide during the hour-long outage.
Fastly’s brief snafu doesn’t appear to have bothered investors too much. FSLY dipped $0.81 in pre-market trading at the New York Stock Exchange—likely due to the outage itself—but bounced up $3.24 by 10 am. At closing, it was up to $56.20, a 10.8 percent increase from yesterday’s $50.70 closing price.