March 8th, 2017 AT&T VoLTE 911 Outage
Report and Recommendations
Public Safety Docket No. 17-68
A Report of the Public Safety and Homeland Security Bureau
Federal Communications Commission
I. EXECUTIVE SUMMARY
1. On the afternoon of March 8th, 2017, nearly all AT&T Mobility (AT&T) Voice over LTE customers across the nation lost 911 service for five hours. Federal Communications Commission (Commission) Chairman Ajit Pai immediately directed the Public Safety and Homeland Security Bureau (Bureau) to investigate the causes, effects and implications of the outage. In response, the Bureau reviewed and analyzed outage reports filed in its Network Outage Reporting System (NORS), as well as sought and reviewed public comments and related documents, and held meetings with relevant stakeholders, including service providers and public safety entities. The Bureau also examined the record to identify ways to prevent future occurrences of such an outage. This report presents the Bureau’s findings.
2. As described in greater detail below, the outage was caused by an error that likely could have been avoided had AT&T implemented additional checks (e.g., followed certain network reliability best practices) with respect to their critical 911 network assets. Approximately 12,600 unique users attempted to call 911, but were unable to reach emergency services through the traditional 911 network. This was one of the largest 911 outages ever reported in NORS, as measured by the number of unique users affected.
3. Among the lessons learned from the March 8th outage is that when 911 service fails for any reason, Public Safety Answering Points (PSAPs) play a critical role in advising their jurisdictions of alternative ways to reach help. While AT&T and their subcontractors, Comtech and West, made efforts to notify thousands of PSAPs, the notifications were often unclear or missing important information, and generally took a few hours to occur. This outage also offers an illuminating case study that illustrates actions that stakeholders can take to promote network reliability and continued access to 911 service. For example, the March 8th outage emphasizes the importance of auditing all network assets critical to the provision of 911 service, and ensuring that such assets are safeguarded and designed to avoid single points of failure. The outage also demonstrates the need for closer coordination between industry and PSAPs, to improve overall situational awareness and ensure consumers understand how best to reach emergency services.
4. One of the Commission’s primary objectives is to “make available, so far as possible, to all people of the United States . . . a . . . wire and radio communication service . . . for the purpose of promoting safety of life and property.” In furtherance of this objective, the Commission has taken measures to promote the reliable and continued availability of 911 telecommunications service. In 1997, the Commission adopted rules requiring Commercial Mobile Radio Service (CMRS) providers to implement 911 and Enhanced 911 services, and to “transmit all wireless 911 calls without respect to their call validation process to a Public Safety Answering Point.”
5. The Commission has adopted PSAP outage notification requirements where service outages could affect the delivery of 911 calls. In the 2004 Part 4 Report and Order, the Commission required “originating service providers” to notify PSAPs “as soon as possible” when they have experienced an outage that “potentially affects” a 911 special facility, and convey “all available information that may be useful to the management of the affected facility in mitigating the effects of the outage on callers to that facility.” Originating service providers include cable communications providers, satellite operators, wireless service providers, and wireline communications providers – entities that offer the ability “to originate 911 calls.” In the 2013 911 Reliability Order, the Commission adopted PSAP outage notification requirements for service providers that offer core 911 capabilities or deliver 911 calls and associated number or location information to the appropriate PSAP, defining them as “covered 911 service providers.” The Commission required covered 911 service providers to notify 911 special facilities of outages that potentially affect them within 30 minutes of discovering an outage. The Commission further required that covered 911 service providers update PSAPs within two hours of their initial contact in order to communicate available information about the nature of the outage, its best-known cause, geographic scope, and the estimated time for repairs. In its comments to this 2013 proceeding, APCO urged the Commission to extend these more specific PSAP notification rules to originating service providers as well, but the Commission declined to do so because covered 911 service providers “are the entities most likely to experience outages affecting 911 service,” and deferred the issue for future consideration.
6. In addition to adopting PSAP outage notification requirements, the 911 Reliability Order also adopted 911 network reliability requirements for covered 911 service providers. These requirements were based on best practices developed and recommended by the Commission’s federal advisory committee, the Communications Security, Reliability, and Interoperability Council (CSRIC) and were intended to address the network reliability problems that were brought to light by the 2012 “derecho” storm outages. The Commission’s 911 reliability rules require covered 911 service providers to “certify annually whether they have, within the past year, audited the physical diversity of critical 911 circuits or equivalent data paths to each PSAP they serve, tagged those circuits to minimize the risk that they will be reconfigured at some future date, and eliminated all single points of failure.” In the alternative, the Commission permitted covered 911 service providers to describe “reasonably sufficient alternative measures they have taken to mitigate the risks associated with the lack of physical diversity.” In 2014, the Commission proposed to revise these 911 reliability requirements to address failures that led to the 2014 multi-state outages, and proposed additional mechanisms designed to ensure that the Commission’s 911 governance structure kept pace with evolving technologies and new reliability challenges.