On Thursday 14th April at 5.45pm, a major service outage occurred on the core network. This failure resulted in the complete loss of services for a number of schools on the network in Derbyshire, Leicestershire, Leicester City, Nottinghamshire, Nottingham City and Northamptonshire.
The loss of services was due to the failure of the equipment connecting the Derby and Nottingham Data Centres. The connectivity equipment was replaced by the carrier BT and tested to be operable, but service did not return. A number of further lines of investigation took place during the night but the fault remained. During early morning calls some repeat checks were made which indicated a configuration error with the replaced equipment, upon correcting this configuration service returning.
In total, the failure lasted for 16 hours and 47 minutes and was resolved on Friday 15th April at 10.32am.
During this incident there was a lack of escalation to senior management within KCOM so the incident was not managed as expected. Communication of the service failure was poor; information was not published as required onto the portal. However, emPSN were able to provide service updates via Twitter; @emPSN.
emPSN is working with KCOM to investigate why the service failure occurred whilst seeking to improve communications with schools should a device failure happen again in future. The loss of our core service is exceptionally rare, with service availability over the last 3 years exceeding the SLA of 99.95%.