Facebook has said an error during routine maintenance of its network of data centers caused a cascade of problems that took down its platforms for more than six hours on Monday.

In a blogpost published on Tuesday, Santosh Janardhan, vice-president of engineering, said the global outage that saw Facebook, Instagram and WhatsApp go dark for billions of users had begun when the company’s engineers issued a command that unintentionally disconnected Facebook data centers from the rest of the world.

Janardhan described the error as originating within the company’s “global backbone” of fiber-optic cables and data centers.

“This outage was triggered by the system that manages our global backbone network capacity,” Janardhan wrote. “The backbone is the network Facebook has built to connect all our computing facilities together, which consists of tens of thousands of miles of fiber-optic cables crossing the globe and linking all our data centers.”

“During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally,” Janardhan said.

diagram explains facebook outage
Photograph: Reuters

The company said its systems were designed to audit commands to prevent mistakes, but the audit tool had encountered a bug and had failed to stop the command that caused the outage. The outage had knocked out tools that engineers would normally use to investigate and repair such outages, making the task even more difficult.

The outage was the largest that Downdetector, a web monitoring firm, said it had ever seen.

Facebook said it had not been caused by malicious activity.

While users lost access to one of the world’s most popular messaging apps – WhatsApp has more than 2 billion users – employees were also blocked from internal tools.

The company said it had sent a team of engineers to the location of its data centers to try to debug and restart the systems.

However, it took the company extra time to get engineers inside to work on the servers due to the physical and system security in place.

Even after network connectivity was restored to the data centers, Facebook said it worried a surge in traffic would cause its websites and apps to crash.

But because the company had run drills to prepare for such situations, access to its services returned relatively quickly.

“Every failure like this is an opportunity to learn and get better,” Janardhan wrote. “From here on out, our job is to … make sure events like this happen as rarely as possible.”

The outage came during a difficult week for Facebook, as the US Senate held a hearing with a former employee turned whistleblower who accused the social network of putting profits before people’s safety, a claim that Facebook disputes.

Source: Facebook explains error that caused global outage

Facebook explains error that caused global outage - Click To Share

Other recent press releases

*This is a free press release. All upgraded press releases are ad-free!

Pop Culture Expo NEXUS FAN FAIR Coming to Manila Sept.

Pop Culture Expo NEXUS FAN FAIR Coming to Manila Sept. 19–21, Courtesy of Philippine Blockchain Week 2023MANILA, Philippines, Sept. 14, 2023 — A new pop culture expo is launching in Manila with the inaugural NEXUS FAN FAIR from September 19 to 21 at the MGBX Convention Hall in Newport World Resorts’ Marriott Grand Ballroom. Organized by Philippine Blockchain

New Metaverse for Work and Play, CubeOasis Launches AR to Earn (A2E) Business Platform

New B2B Virtual World Platform for Work and Play, CubeOasis Launches AR to Earn (A2E)SYDNEY, Australia, Sept. 5, 2023 — CubeOasis, the first AR to Earn (A2E) virtual world platform, is proud to announce the launch of their new virtual space for both work and play. Introducing an unprecedented paradigm shift for businesses that propels them into a