Due to a unexpected reload of a broadband gateway on our network this afternoon, we are seeing a traffic imbalance on part of our broadband network.
We will be taking steps to disconnect sessions gratefully and re balance the affected gateways.
End users will see a PPP reconnect taking approximately 5-20 seconds. In the rare event your connection does not restore. You will need to power OFF and ON your router.
UPDATE01 – 22:10
This work is now complete.
We have seen a number of leased lines and peering sessions go down over night. This has been caused by a possible fire (fire alarms are going off) in at least 1 London data center affecting a number of providers. We are working to obtain further information.
UPDATE 01 – 8:00
We have been advised the London harbor building (Equinix LD8)remains evacuated.
UPDATE 02 – 09:15
Equinix have advised that a fire alarm was triggered by the failure of output static switch from their Galaxy UPS system. This has resulted in a loss of power for multiple customers and Equinix IBX Engineers are working to resolve the issue and restore power. At this moment in time we do not believe there to have been a fire.
UPDATE 03 – 10:15
Equinix IBX Site Staff report that the root cause of the fire evacuation was due to the failure of a Galaxy UPS that triggered the fire alarm. The fire system has been reinstated and IBX Staff have been allowed back in to the building. We are now awaiting updates on restoring services.
UPDATE 04 – 11:15
Equinix Engineers have advised that their IBX team have begun restoring power to affected devices. Unfortunately, at present there remains no estimate resolution time.
UPDATE 05 – 12:15
Equinix have advised that services are started to be restored with equipment being migrated over to other newly installed infrastructure. We have yet to see any of our affected connections restore but keep checking for updates.
UPDATE 06 – 13:15
Equinix IBX Site Staff reports that services have been further restored to more customers and IBX Engineers continue to work towards restoring services to all customers by migrating to the newly installed and commissioned infrastructure. Equinix advised access will be granted and prioritized to the IBX should any customers need it to work on their equipment.
UPDATE 07 – 14:20
IBX Site Staff reports that services have been further restored to more customers and increasing numbers of those affected are now operational along with the majority of Equinix Network Services. IBX Engineers continue to work towards restoring services to all customers by migrating to the newly installed and commissioned infrastructure
UPDATE 08 – 15:15
We are please to advise we have just seen all affected services restore. Circuits remain at risk due to the ongoing power issues on site, however we do not expect them to go down again.
We are aware a number of broadband services have been dropping PPP sessions over the past several hours. Initial diagnostics show nothing wrong our side and we have raised this with our suppliers for further investigation.
UPDATE 01 – 14:27
We have received an update to advise there is an issue further upstream and emergency maintenance work is required. Due to the nature of the work, we have been told this will start at 14:30 today. The impact of this will be further session drops while core devices are potentially reloaded carrier side.
We are sorry for the short notice and impact this will have and have requested a RFO already for the incident.
UPDATE 02 – 15:54
We have been advised the work is complete. We are awaiting this to be 100% confirmed.
We are aware our unauthenticated SMTP relay cluster has been subject to relay abuse by a compromised client. Currently SMTP services are suspended on the cluster.
UPDATE 01 – 22:30
SMTP services on the cluster remain suspended while we review. Further updates will be provided on the 29/07/2020
UPDATE 02 – 09:15 – 29/07/2020
After a full review. Due to the age of the platform, End of life support on the OS, extremely low usage levels (less than 0.1%) and the lack of support for enhanced security such as DKIM and DMARC. We have decided to withdraw the platform form service.
For customers who where using the service, We would advise migrating to authenticated SMTP provided via your web hosting provider or signing up with a free relay such as https://www.smtp2go.com/
We understand change is unwelcome, but after review we feel this is in the interest of all who still use the platform to protect your domain and others.
We are aware one of our broadband gateways has reloaded and dropped a number of broadband sessions. Traffic was rerouted to other gateways however, the network will need to be rebalanced in the early hours of the morning.
We are sorry for the impact this will have had on you.
Our network monitoring has alerted us to a number of BTW based circuits going offline and prefix withdrawals from suppliers. We are currently investigating.
UPDATE 01 – 14:49
We are seeing reports from other providers that they have experienced other issues. Initial investigations appear to show this as a problem within the “Williams House” Equinix data center in Manchester.
UPDATE 02 – 15:51
Connections are starting to restore. Services affected appear to have been routed via Manchester.
We will be making some changes to our broadband network tonight in order to isolate 2 upstream gateways we suspect of causing additional latency to circuits routed via them.
This will cause existing connection via these gateways to drop and reconnect. Due to the nature of the change this can take up-to 20 minutes.
UPDATE 01 – 23:03
This work is about to start.
UPDATE 02 – 23:06
Tunnels have been terminated and traffic is starting to move across to other gateways.
UPDATE 03 – 23:28
We have seen an issue with the L2TP control messages not being accepted by the upstream gateways and releasing circuits to other gateways. We have therefore had to revert part of the configuration. Further works will be required at a later date.
We are aware of an issue affecting inbound calls with one of our upstream voice carriers. We have re-routed outbound calls around the affected network and calls should be connecting as expected.
We have raised a priority case with the carrier who have confirmed there is an issue and is being dealt with urgently.
We apologize for the disruption and will update this NOC post once further details become available.
UPDATE 01 – 17:10
We have started to see inbound calls on the affected carrier restore and traffic flowing. We have not had official closure yet, so services should be considered at risk still
UPDATE 02 – 17:33
The affected upstream carrier has confirmed services have been restored and this was the result of a data center issue. We have asked for a RFO and this will be provided as requested.
re-routing has been removed and all service are normal.
Once again we apologize for the disruption
FINAL – 04/03/2020 – 14:45
We have been advised the ROOT cause of this incident was the result of a failed network interface on a primary database server within the carrier network. We have been advised the database is redundant but this has highlighted the need for additional redundancy and is already being deployed.
At 08:15 GMT this morning, we were alerted to a number of DSL broadband sessions disconnecting. Initial diagnostics showed there was no fault within our network and this was escalated to our wholesale supplier.
Our wholesale supplier responded to advise a DSL gateway “cr2.th-lon” at 08:15 GMT had dropped a number of sessions however had started to recover at 08:23 GMT. At this time the root cause of the outage is unknown but investigations are continuing. Services should be considered at risk until we ascertain the cause.
UPDATE 01 – 10:50
We have seen a further drop in sessions where sessions have had to re-authenticate. We have requested an update from our supplier to enquire of this is related to the issues seen this morning.
We are aware of issues with customers routed via sip03.easyipt.co.uk for VoIP calls. We are currently looking at this as a matter of urgency.
UPDATE 01 – 13:39
We have discovered an issue with the Database running on the media gateway and will be performing a reboot. Any active calls will drop.
UPDATE 02 – 13:45
The reboot has completed. However we have lost a number of critical services and are working to restore them.
UPDATE 03 – 13:58
We have been able to recover the services, however concerned about stability and why these services did not automatically start as expected. We are undertaking further reviews and the platform should still be considered as “at risk” until further notice.
UPDATE 04 – 14:47
We have been able to automatically recover a number of services, however we are still seeing some services fail to load on boot. This is something we need to look in too and believe it to be part of a race condition on the server. The media gateway has remained stable and processing calls as expected.
A full review of the media gateway will take place next week to ensure all startup services recover as expected.