At 17:11 our networking monitoring advised to a failure on our network which was localised to a single rack within LON01.
Initial diagnostics started and it was assumed there was a power issue within Rack03 as all devices on Feed A where showing as “unreachable”
Further diagnostic showed this not to be the case but a line card within LON01-CORE having been reloaded by the primary supervisor due to a series of automated test failures.
UPDATE01 – 17:18
The module has come back online and all traffic has resumed. Engineers are currently working on why this occurred.
We are currently aware of an incident on Virgin Media’s network which is causing a loss of service to several Virgin Media provided leased lines.
Virgin Media are currently investigating and updates will be provided as and when they become available.
UPDATE01 – 10:52
Virgin Media have identified a problem with one of their core routers in Telehouse and they are continuing their investigation although no ETR is available as of yet.
UPDATE02 – 11:00
We are seeing service restore back to Virgin media circuts, however this has not been confirmed by Virgin yet.
UPDATE03 – 12:00
We have requested an RFO from Virgin which we will make avalible via the NOC
There will be Planned Maintenance on ESXi02.R01 & ESXi03.R01 to resolve the ongoing remote management issue with these systems which a resolution has now been identified.
Work will involve a reload of these boxes. During this reload EasyBOND services will be effected. Connections will however be manually transferred to the backup aggregation service prior to the work but users will see a sort 10-30 second outage.
We apologise for any inconvenience this may cause.
UPDATE 01 – 20:28
This work is complete and all services have been transferred back to their original primary servers. We will continue to monitor the ESXi hosts.
At 10:52am our network monitoring alerted us to a issue with our Frontier voice interconnect. At this time we lost our BGP session to them and traffic started routing over diverse paths. This would have caused all active calls to drop while BGP reconverted.
UPDATE01 – 11:10
At 11:02 our network monitoring showed that the interconnect had been resorted and the BGP session has resumed. We have logged a case with our carrier to see if the fault originated from there side as no other session losses were seen from our side.
UPDATE02 – 11:30
We have been advised this is a carrier issues with C4L.
During the emergency reboot of ESXI02.R01 following our network monitoring advising the management interface was showing as off-line. We discovered an issue with the ARP timeout settings on LON01-CORE core that could have caused extended downtime in the event that the EasyBond service needed to flip between aggregators.
We have corrected this configuration issue and completed a fail-over test which completed as expected.
Users on EasyBond would have seen 2 x 30 seconds outages. We apologise for any inconvenience caused.