Broadband Outage 20/01/2023

We are currently aware of a network issue effecting broadband customers, engineers are already on site in preparation for the works at Telehouse and are currently working on the issue. More updates to follow.

Update 20/01/203 22:02

Engineers are still working to find the root cause of the issue, we will post more updates as they become available.

Update 20/01/203 23:04

We can see connections have now come back online, engineers are still working on the issue and will provide an update shortly.

Update 20/01/203 23:22

If you are still without service please power down your router for at least 30 minutes, this should restore your service.

Update 21/01/2023 10:10

Anyone without service please reboot or power down your router for 20 minutes. We are sorry for the issues caused and a further update will be posted with details as to the cause shortly.

Summery 16/02/2023:

A full RFO has been sent to partners and wholesale customers.

On the night of the 20/01/2023 (during the 2nd planned Telehouse power works) Engineers were on site prepping for the power works and to be there should issues arise. We took the opportunity to proactively replace a PDU bar which was showing signs of having a failing management interface. This PDU was on the “FEED A” (The side Telehouse were working on) so no additional deemed risk was expected.

Power feed A was isolated and taken down shortly before the power works
where due to start and the PDU replaced but not re-powered due to the pending
works by Telehouse.

All platforms were operating as expected on a single power feeds.

Additionally, a planned line card replacement was due to take place which
involved moving DSL subscribers across the network in a controlled manor. The
affected LNS01 and LNS03 where isolated and subscribers were moved across. The isolated LNSs were bought back into service shortly after.

At this point we noticed that new inbound DSL connections where only being routed to LNS02 and LNS04. The migrated configuration was checked and confirmed to be as expected.

At this point LNS02 started to reboot uncontrollably which dropped all connected DSL subscribers in an uncontrolled manor. LNS02 was manually rebooted and returned to service but quickly started to reboot again. LNS02 was taken out of service and powered down.

Services from LNS02 did not reconnect so changes where rolled back  on the line card migration however this did not make any difference.  

Diagnostics on our side did not show the incoming RADIUS proxy requests from our layer 2 provider so we placed a call to there NOC who failed to confirm anything was wrong despite several calls. (This has now been confirmed and was the root cause for the extended outage)

LNS02 was powered backup and diagnostics showed the 12V power rail on the remaining power supply was low and causing the device to reload, however due to the quick reload times on these devices, it was not being flagged on SNMP and due to a combined voltage when both PSUs where energised it did not show as low prior to the event. Power was then swapped over to the other working power supply that was offline due to the power works. This resulted in a stable device.

LNS02 was then bought back into service however no DSL circuits where being routed to us.

Further investigations were taking place when a large volume of inbound DSL connections started to be seen authenticating.

Since the events took place, our wholesale DSL provider confirmed they experienced a Major outage on one of the access routers we are connected to however failed to advise us at the time until many hours after the events took place. A formal complaint has been raised and a RFO has since been provided to confirm a number of devices there side suffered issues and have since been replaced.

While there was a failure of one of our gateways, these are in redundant pairs and would not have caused a complete outage by itself. The events that took place further upstream with our wholesale provider where the root cause of the extended outage.

This was unfortunate timing and had we been advised of the issues, we would have been able to address the outage in another way. We do apologise for the issues seen.

Telehouse – UPS Replacement *At Risk* 20/01/2023 – 21/01/2023

We have been advised by Telehouse that they are undertaking power works to replace both UPS systems feeding the colocation suite where one of our racks is located as part of there Hardware upgrade program.

During these enabling works, Telehouse we will be isolating one UPS extension switchboard at a time covering two separate dates. This will ensure that there is one UPS System supporting our customer’s rack power load on each of these dates, to avoid total loss of power.

All of our hardware at this location is diversely fed by by dual redundant power supplies and we don’t expect any interruption to power or services but this should be classed as *At Risk*. Telehouse have provided a detailed scope of works that we have been asked not to share but they are very comprehensive and details power should not be disrupted for any great length of time.

Due to the unforeseen issues that arose last Friday. Structured engineers will be on site for the duration of the works being completed.

We have also decided to replace the the PSUs in our core devices prior to the works taking place and to manually transfer power away from the power feed being worked on to better manage any unforeseen outages this time.

Structured works will start at 19:30 to replace power supplies

Telehouse works will commence from 20:00

UPDATE01 – 18:00

Structured engineers are on site and prepping / reviewing works.

UPDATE02 – 18:20

Engineers have identified the need to proactively move DSL subscribers away from LNS01 and LNS03. This is being done by gracefully dropping PPP connections.

Network Outage 13/01/2023

We are aware of downtime due to an issue at Telehouse London where parts of our network are based. Engineers are already on route and are speaking directly with the data centre to get everyone online again ASAP. Further updates to follow shortly.

Update 14/01/23 00:47am

Services have been restored but remain at Risk while engineers continue to work on the issue. Further updates to follow.

Update 14/01/23 6:53am

Engineers are still at the Telehouse data center replacing failed hardware. Services are still currently at risk but online. More updates to follow.

Update 14/01/23 8:42am

We are starting to see the majority of connections come back online. If you are still having issues please power down your router for at least 20 minutes, then power it back on. This should get the connection working again for you.

Update 17/01/2023 @ 12:33pm FINAL

SUMMERY

This outage was caused by a number of unforeseen cascading events due to the power works undertaken by Telehouse and affects on our power supplies and PDUs. Service was restored upon Structured engineering attendance at site and the replacement of a large amount of hardware.

Further works are planned for the 20th by Telehouse however we will be on site for the duration.

We are also reviewing the events that lead up to the issue and putting in place measures to ensure they do not happen again.

Telehouse – UPS Replacement *At Risk* 13/01/2023 – 14/01/2023

We have been advised by Telehouse that they are undertaking power works to replace both UPS systems feeding the colocation suite where one of our racks is located as part of there Hardware upgrade program.

During these enabling works, Telehouse we will be isolating one UPS extension switchboard at a time covering two separate dates. This will ensure that there is one UPS System supporting our customer’s rack power load on each of these dates, to avoid total loss of power.

All of our hardware at this location is diversely fed by by dual redundant power supplies and we don’t expect any interruption to power or services but this should be classed as *At Risk*. Telehouse have provided a detailed scope of works that we have been asked not to share but they are very comprehensive and details power should not be disrupted for any great length of time.

Telehouse staff will be on hand in the event of any issues and we ill be monitoring off-site attending in person if required.

Broadband – 27/10/2022 – IPv6

We are aware of an IPv6 issue on the network following on from a firmware upgrade within our core last night.

We have been working with the hardware vendor to resolve the issue but while this is ongoing , we have been reverting back to a previous version of firmware and moving connections between gateways.

Some users will experience a graceful PPP drop of around 5 seconds while there connection re-authenticates.

We do apologise for any inconvenicance

UPDATE01

LNS02, LNS03, LNS04 have been reverted and IPv6 connectivity has been restored.

LNS01 has been isolated for testing. Anyone experiencing slow DNS lookups or applications now loading are advised to reboot there router which will be routed to one of the other gateways with the fix applied.

UPDATE02

Further issues where identified with IPv6 within our core network (Broadband facing) Work to resolve this has now completed and IPv6 should now be fully operational again across all gateways.

EasyHTTP – 09/09/2022 – 12:30

We have made some changes to our hosted email platform in respect to DNSBL validation for SPAM and listings to the spam services running on the server.

This is in response to some office365 delivery issues that took place last month.

We don’t expect any mail delivery issues as a result of this and changes have been tested with known problematic mailboxes but any NDRs (none delivery reports) should be immediately reported to us via support.

Ethernet EAD – 22/07/2022 – 11:30

We are aware there are a small number of layer 2 ethernet services currently down at the moment. Internal investigations show this to be a upstream supplier issue and we are currently engaging with them to locate the fault.

UPDATE01: 11:40

We are seeing services down from BT Wholesale, Openreach EAD Direct, TTB so we suspect this may be a common POP failure between layer2 providers landing circuits in London. Customers with backup will have automatically kicked in and re-routed.

UPDATE02: 12:00

We are still awaiting for an official update from our layer2 provider as to the root cause of this issue but they have advised they are seeing circuits down with a large number of calls on hold to the service desks.

We have already escalated to our management contacts to push for information so we can provide detailed updates.

UPDATE03: 12:25

Our Layer2 provider has now declared a “Major Incident”

They have advised this appears to be related to a “DNS issue” but we have disputed that. We are continuing to chase for updates. We apologise to affected customers and the inconvenience this is causing.

UPDATE04: 12:41

Our layer2 provider has now advised of a major internal core network problem affecting more than just layer2 services. This is currently affecting less than 10% of our overall EAD circuits with this layer2 provider and services delivered via other layer2 partners are unaffected.

We have been advised internal teams are working to identify the root cause and issue a fix. We again apologise to affected customers and the inconvenience this is causing.

UPDATE05: 12:55

Following further pressure to our account manager we have been advised this is affecting layer2 services being delivered to us via a core device located in there network at Interxion London with multiple service tunnels flapping.

We have been advised there will be a further update by 13:30

UPDATE06: 13:40

A further update has been provided to advise they are still working on why network tunnels on this device are “flapping” They have advised a further update by 15:30 but we will keep pushing for information and a ETA on service resolution.

UPDATE07: 15:05

Our network monitoring has shown our end points are reachable again on the affected circuit’s. We have no been provided an official clear yet and services should still be classed as at risk.

UPDATE08: 17:00

Our layer2 provider has advised they have re-routed around the affected device and are currently working with the hardware vendor to establish why the device has failed in the way it has. We suspect there may be a short outage at a later date once services are re-routed back via this device but we will advise at the time.

HOR-DC – *at risk*

Our network monitoring has alerted us to multiple circuit failures within our Horsham facility. Initial diagnostics seem to show fibre breaks and we suspect this may be the result of civil contractors. Traffic is flowing across redundant paths in to the building with no loss of primary peering or transit, but should be considered “at risk” due to operating on redundant links.

Ethernet services that terminate in to our Horsham facility will have automatically failed over to backup if purchased.

Faults have been logged with Openreach and we will keep updating this page as we know more.

UPDATE 01 – 12:01

We have seen all our “primary” fibre links recover and service has been restored, however no official update has been provided. We are still awaiting recovery of the other affected fibre links.

UPDATE 02 – 12:10

Openreach engineering teams are on route to our facility.

UPDATE 03 – 14:50

Openreach are on site

UPDATE 04 – 15:00 *FINAL*

All fiber links have been restored. Contractors working on the Openreach network had trapped one of our fibre tubes running that route and caused bends on the groups of affected fibre to the point light was unable to pass.

Tubing and fibres have been re run in the AG Node by Openreach and service has been restored.