Uncategorized – Structured Communications NOC

SIP / Voice – OVH – 09:15

We are aware of a network issue affecting OVH and a small number of VPS images we still have hosted with them. This is having an impact on calls for these clients. Engineers are investigating.

UPDATE01 – 09:30

We have attempted to re-route our traffic around the issue, however the issue is within the core OVH network.

UPDATE02- 09:40

We are starting to see traffic to OVH return to normal. We are monitoring and can only apologise for the issues.

Wholesale DSL maintiance 26/01/2023

Following on from the broadband issues occurring on the 20/02/2023 (report still pending) We have been advised from our wholesale supplier that they are replacing the equipment at fault there side tonight (26/01/2023) from 23:00 – 06:00 hours under an emergency maintenance window which would drop DSL connections for a few minutes.

We as a pre-emptive measure have already moved our traffic away from the device they are working on in Telehouse North over to Telehouse East to avoid the disruption. This has been confirmed by both network teams.

Once the work is complete, we will move traffic back.

We appreciate the hesitation this notice will bring given the recent issues, However this should not cause disruption given the circumstances are very different.

UPDATE01 21:30

We have been advised this work will likely be suspended due to an pre existing network issue within our wholesale providers network. We have seen our Telehouse North link we removed traffic from this afternoon go out of service which we have been indirectly advised of will be down to the pre existing network issue. Telehouse East remains online but should be considered “at risk”

UPDATE02 22:46

We have been indirectly advised that network issues within our wholesale provider are starting to settle down. Our primary Telehouse North link remains offline but is actively trying to reconnect and restore redundancy. Services remain “at risk”

UPDATE03 23:29

We have seen our primary Telehouse North link restore and remain stable for over 25 minutes. We will leave Telehouse East (Backup) as “preferred” until after the advised wholesale maintenance window and review Friday morning. Until this point the “at risk” notice will remain.

UPDATE04 – 27/01/2023 – 10:25

We are continuing to monitor.

Broadband Outage 20/01/2023

We are currently aware of a network issue effecting broadband customers, engineers are already on site in preparation for the works at Telehouse and are currently working on the issue. More updates to follow.

Update 20/01/203 22:02

Engineers are still working to find the root cause of the issue, we will post more updates as they become available.

Update 20/01/203 23:04

We can see connections have now come back online, engineers are still working on the issue and will provide an update shortly.

Update 20/01/203 23:22

If you are still without service please power down your router for at least 30 minutes, this should restore your service.

Update 21/01/2023 10:10

Anyone without service please reboot or power down your router for 20 minutes. We are sorry for the issues caused and a further update will be posted with details as to the cause shortly.

Summery 16/02/2023:

A full RFO has been sent to partners and wholesale customers.

On the night of the 20/01/2023 (during the 2nd planned Telehouse power works) Engineers were on site prepping for the power works and to be there should issues arise. We took the opportunity to proactively replace a PDU bar which was showing signs of having a failing management interface. This PDU was on the “FEED A” (The side Telehouse were working on) so no additional deemed risk was expected.

Power feed A was isolated and taken down shortly before the power works
where due to start and the PDU replaced but not re-powered due to the pending
works by Telehouse.

All platforms were operating as expected on a single power feeds.

Additionally, a planned line card replacement was due to take place which
involved moving DSL subscribers across the network in a controlled manor. The
affected LNS01 and LNS03 where isolated and subscribers were moved across. The isolated LNSs were bought back into service shortly after.

At this point we noticed that new inbound DSL connections where only being routed to LNS02 and LNS04. The migrated configuration was checked and confirmed to be as expected.

At this point LNS02 started to reboot uncontrollably which dropped all connected DSL subscribers in an uncontrolled manor. LNS02 was manually rebooted and returned to service but quickly started to reboot again. LNS02 was taken out of service and powered down.

Services from LNS02 did not reconnect so changes where rolled back on the line card migration however this did not make any difference.

Diagnostics on our side did not show the incoming RADIUS proxy requests from our layer 2 provider so we placed a call to there NOC who failed to confirm anything was wrong despite several calls. (This has now been confirmed and was the root cause for the extended outage)

LNS02 was powered backup and diagnostics showed the 12V power rail on the remaining power supply was low and causing the device to reload, however due to the quick reload times on these devices, it was not being flagged on SNMP and due to a combined voltage when both PSUs where energised it did not show as low prior to the event. Power was then swapped over to the other working power supply that was offline due to the power works. This resulted in a stable device.

LNS02 was then bought back into service however no DSL circuits where being routed to us.

Further investigations were taking place when a large volume of inbound DSL connections started to be seen authenticating.

Since the events took place, our wholesale DSL provider confirmed they experienced a Major outage on one of the access routers we are connected to however failed to advise us at the time until many hours after the events took place. A formal complaint has been raised and a RFO has since been provided to confirm a number of devices there side suffered issues and have since been replaced.

While there was a failure of one of our gateways, these are in redundant pairs and would not have caused a complete outage by itself. The events that took place further upstream with our wholesale provider where the root cause of the extended outage.

This was unfortunate timing and had we been advised of the issues, we would have been able to address the outage in another way. We do apologise for the issues seen.

Telehouse – UPS Replacement At Risk 20/01/2023 – 21/01/2023

We have been advised by Telehouse that they are undertaking power works to replace both UPS systems feeding the colocation suite where one of our racks is located as part of there Hardware upgrade program.

During these enabling works, Telehouse we will be isolating one UPS extension switchboard at a time covering two separate dates. This will ensure that there is one UPS System supporting our customer’s rack power load on each of these dates, to avoid total loss of power.

All of our hardware at this location is diversely fed by by dual redundant power supplies and we don’t expect any interruption to power or services but this should be classed as *At Risk*. Telehouse have provided a detailed scope of works that we have been asked not to share but they are very comprehensive and details power should not be disrupted for any great length of time.

Due to the unforeseen issues that arose last Friday. Structured engineers will be on site for the duration of the works being completed.

We have also decided to replace the the PSUs in our core devices prior to the works taking place and to manually transfer power away from the power feed being worked on to better manage any unforeseen outages this time.

Structured works will start at 19:30 to replace power supplies

Telehouse works will commence from 20:00

UPDATE01 – 18:00

Structured engineers are on site and prepping / reviewing works.

UPDATE02 – 18:20

Engineers have identified the need to proactively move DSL subscribers away from LNS01 and LNS03. This is being done by gracefully dropping PPP connections.

Network Outage 13/01/2023

We are aware of downtime due to an issue at Telehouse London where parts of our network are based. Engineers are already on route and are speaking directly with the data centre to get everyone online again ASAP. Further updates to follow shortly.

Update 14/01/23 00:47am

Services have been restored but remain at Risk while engineers continue to work on the issue. Further updates to follow.

Update 14/01/23 6:53am

Engineers are still at the Telehouse data center replacing failed hardware. Services are still currently at risk but online. More updates to follow.

Update 14/01/23 8:42am

We are starting to see the majority of connections come back online. If you are still having issues please power down your router for at least 20 minutes, then power it back on. This should get the connection working again for you.

Update 17/01/2023 @ 12:33pm FINAL

SUMMERY

This outage was caused by a number of unforeseen cascading events due to the power works undertaken by Telehouse and affects on our power supplies and PDUs. Service was restored upon Structured engineering attendance at site and the replacement of a large amount of hardware.

Further works are planned for the 20th by Telehouse however we will be on site for the duration.

We are also reviewing the events that lead up to the issue and putting in place measures to ensure they do not happen again.

Ticket System – 24/01/2022 – 09:30

We are aware of an issue with our ticket system. We are currently investigating the platform. Tickets sent in wont be lost but urgent issues we would advise customer’s to call on 0203 301 4000

Broadband – 08:35 – 03/11/2021

We are aware of a large drop on broadband services across the UK. We are currently investigating as a matter of urgency.

14/10/2020 12:43– Leased Lines

We are aware of an issue affecting leased lines services delivered via Telehouse North. We are investigating as a matter of urgency. We sorry for any inconvenience being caused.

Services with backup DSL will have been automaticity re-routed.

UPDATE 01 – 12:50

Further investigations have shown this is affecting services delivered from our redundant fibre routes as well. We are continuing to work with our suppliers as the root cause has been identified outside of our network.

UPDATE 02 – 13:02

We are starting to see services recover but services should still be considered at risk.

UPDATE 03 – 14:00

We have had further feedback from our suppliers to advise this has been resolved and they believe this to have been down to a configuration issue there side. We have raised this as concern as this took down multiple fail over links for redundancy.

FINAL – 10:39

We have been advise this issue was cause by human error within our wholesale provider and the failure to follow strict guidelines when undertaking work. Due to the impact this had on ourselves and the loss of our redundant links with this provider, We are undertaking an internal review to ensure we mitigate against this in the future.

London Data Center Fire – 07:25 – 18/08/2020

We have seen a number of leased lines and peering sessions go down over night. This has been caused by a possible fire (fire alarms are going off) in at least 1 London data center affecting a number of providers. We are working to obtain further information.

UPDATE 01 – 8:00

We have been advised the London harbor building (Equinix LD8)remains evacuated.

UPDATE 02 – 09:15

Equinix have advised that a fire alarm was triggered by the failure of output static switch from their Galaxy UPS system. This has resulted in a loss of power for multiple customers and Equinix IBX Engineers are working to resolve the issue and restore power. At this moment in time we do not believe there to have been a fire.

UPDATE 03 – 10:15

Equinix IBX Site Staff report that the root cause of the fire evacuation was due to the failure of a Galaxy UPS that triggered the fire alarm. The fire system has been reinstated and IBX Staff have been allowed back in to the building. We are now awaiting updates on restoring services.

UPDATE 04 – 11:15

Equinix Engineers have advised that their IBX team have begun restoring power to affected devices. Unfortunately, at present there remains no estimate resolution time.

UPDATE 05 – 12:15

Equinix have advised that services are started to be restored with equipment being migrated over to other newly installed infrastructure. We have yet to see any of our affected connections restore but keep checking for updates.

UPDATE 06 – 13:15

Equinix IBX Site Staff reports that services have been further restored to more customers and IBX Engineers continue to work towards restoring services to all customers by migrating to the newly installed and commissioned infrastructure. Equinix advised access will be granted and prioritized to the IBX should any customers need it to work on their equipment.

UPDATE 07 – 14:20

IBX Site Staff reports that services have been further restored to more customers and increasing numbers of those affected are now operational along with the majority of Equinix Network Services. IBX Engineers continue to work towards restoring services to all customers by migrating to the newly installed and commissioned infrastructure

UPDATE 08 – 15:15

We are please to advise we have just seen all affected services restore. Circuits remain at risk due to the ongoing power issues on site, however we do not expect them to go down again.

27/06/2020 21:15 – Broadband Disruption

We are aware one of our broadband gateways has reloaded and dropped a number of broadband sessions. Traffic was rerouted to other gateways however, the network will need to be rebalanced in the early hours of the morning.

We are sorry for the impact this will have had on you.