Emergency Work – 23/11/2018 – 21:30 – LNS01 Broadband Gateway

Our network monitoring has flagged high CPU usage on LNS01 that is starting to affect its operation. We are undertaking an emergency reboot to prevent it crashing completely. This will drop active sessions and force them on to other gateways.

UPDATE01 – 21:35
The reboot is complete. Services transferred to redundant gateways as expected. LNS01 is now back in operation. If you do not have service please power down your router for 20 minutes.

We apologise for any inconvenience caused.

LON01 – VoIP – 21/11/2017

We are aware 1 of our media gateways is not releasing channels once a call has cleared down. This is causing BUSY tones or limit exceeded messages

We are currently working to resolve this ASAP.

UPDATE 01 – 19:00
Emergency Works have started. Any active calls on SIP02 have been dropped. We are sorry for any inconvenience caused.

UPDATE 02 – 19:03
SIP02 has reloaded and all services have restored. We will now look at SIP01

UPDATE 03 – 19:05
Emergency Works have started. Any active calls on SIP01 have been dropped. We are sorry for any inconvenience caused.

UPDATE 04 – 19:03
SIP01 has reloaded and all services have restored.

LON01 – EasyXDSL – 21/02/2017 – 20:30 *Firmware Upgrade*

At the request of our vendor to address a bug which has been resulting in delayed RADIUS ACCOUNTING data we will be upgrading to a firmware that should address this problem. Each firmware upgrade will take no longer than 10 seconds, however a reload of each LNS is required. They will be done 1 at a time and will result in a PPP drop for each DSL circuit. This should automatically re-establish within 60 seconds.

UPDATE01
This work is complete. Any user without a working internet connection are advised to power off there hardware for 20 minutes.

LON01 – EasyXDSL Fibre Hostlink– 06/09/2016 – 23:00

We have been advised by one of our fibre wave providers that they will be conducting some emergency maintenance on some fibre interconnects which includes a circuit used by ourselves.

To avoid any unexpected issues we will manually re-route traffic and flag that link as unavailable. Some PPP sessions may drop and re-establish for DSL circuits using this link.

UPDATE 01 – 22:23

Traffic has been re-routed. We where able to maintain reachability to our L2TP endpoints, as such no PPP session drops where seen. The link will be bought back in to service once we have received a clear from that wave provider.

LON01 – CORE03 – 24/08/2016 – 20:00 Till 23:59 *Emergency Work *

Following on from recent repeated hardware failures on core03.structuredcommunications.co.uk as detailed HERE the decision has been taken to fully replace the device. We will also be taking the opportunity to upgrade the IOS image to bring this device in line with the current images across the rest of our network.

This work will involve powering down and physically moving the current device along with all installed line cards. Due to this all directly connected services (listed below) will be unavailable for the duration of the works.

> Bonded DSL on AG1 & AG2

> un-managed SIP trunking provided via sipwise.easyipt.co.uk

> Managed VoIP services provided via primary-sw.r03.core03 & primary-sw.r04.core03

> Webhosting via server01.easyhttp.co.uk

> VPS sessions on esxi10.r02.structuredcommunications.co.uk

Other services will remain unaffected. Redundant services provided via other parts of the network (Such as DNS & SMTP) will take over. Please ensure you configuration is up to date.

UPDATE01 – 20:10 – 24/08/2016 Engineers are on site and these works have started.

UPDATE02 – 22:06 – 24/08/2016 Engineers have completed the above works ahead of schedule and we can confirm all services have returned to normal. We apologize for the inconvenience caused.

We will continue to monitor the new device to ensure continued operation.

LON01 – CORE03 – 21/08/2016 – 08:33 *At Risk*

Our network monitoring has alerted us to a fault on CORE03 within our Goswell Road network. This fault is a re-occurrence of an issue identified yesterday that was resolved without impact. Additional logging was added at the time to further assist should it be required.

The issue has been tracked down to the “Ethernet Out of Band Channel” (EOBC) control channel on the devices back plane.

Due to the number of line cards automatically taken out of service by device, we are currently investigating to see if this is part of a common hardware fault such as the current active supervisor module.

We diversely route our internal backhull fibre up-links across each core to insure that a single line card failure does not result in an outage. This is currently in operation however we have lost a number of links due to the fault and the device is classed as at risk along with any directly connected equipment.

We are currently reviewing the logs and will update with further information / action plan asap.

UPDATE01 – 09:49
After reviewing the logs we have concluded the next action step is to swap between active supervisors within that device. This will cause a brief outage to all services connected to that device. We will monitor the device closely after the change to see if the same issue occurs. This reload has been scheduled for 10:00 today.

UPDATE02 – 10:08
The swap completed as expected, however despite this supervisor showing OK and passing diagnostics, it failed to fully take the system load and was reverted back. We suspect this is now a possible backplane issue on this device. Further updates to follow.

UPDATE03 – 14:08
Further observations have been made and the log files reviewed at depth. At this stage we can advise the backup supervisor within CORE03 has been reporting errors however the Cisco IOS listed these as “Non-fatal” and as such have not been flagged up within our monitoring platform.

We suspect a fault had occurred on the standby supervisor which had not been picked up on by the devices internal diagnostics until we bought the card fully in to operation. This fault we suspect was having an impact on the EOBC reporting and thus causing line cards to be disabled. As the previous fault took 24 hours to resurface we are continuing to monitor. An emergency maintenance window is also going to be scheduled for CORE03 to replace the suspected failed card, along with an IOS update.

UPDATE04 – 14:30 – 22/08/2016
Despite seeing the device operate for over 24 hours without further errors, we have just observed the fault conditions triggering line cards to be disabled. We therefore now suspect this a problem with the chassis its-self and the backplane. We will now be replacing the entire device as a matter of course to prevent this escalating to an outage. Further works will be scheduled and notified via the NOC as we dont have a pre-built device on site.

LON01 – Access Switches – 27/06/2016 – 21:12 *COMPLETE*

We have identified a security bug within the core firmware installed and running on the following access switches within our network:

primary-sw.r02
backup-sw.r02
primary-sw.r03
backup-sw.r03
primary-sw.r04
backup-sw.r04

Due to the nature of this security bug we have had little option but to immediately update these devices. This update required the above switches to be reloaded so the new IOS could be loaded. Services with a redundant network links to other parts of our network would have seen no disruption.

Other switches are unaffected and this work is now complete.

We apologise for any inconvenience caused.

LON01 – EasyIPT – 16/05/2016 – 21:40 till 22:45 *Emergency Maintenance* *Complete*

We have taken action to reload our primary soft switch to clear down some stuck SIP sessions. We are working with our software vendor to try an automate this process without the need for a complete reload of the singling platform in future.

We have been advised that an update will help limit the need to do this, with a further one planned to resolve this completely.

Calls are routing correctly

LON01 – POP-C002.017 – 06/04/2016 – 10:00 – 12:00 – *Emergency Work* *COMPLETE*

Further to our NOC notice posted on the 31/03/2016 in respect to power at one of our POPs “C002.017 – Goswell Road”, Our network monitoring has alerted us to another loss of power on the secondary feed at this cab.

Structured engineers are attending site tomorrow morning within the above maintenance window to install equipment that will allow us to isolate the faulty hardware without further risk to services provided via this cabinet going forward. The work will involve the replacement of various power distribution hardware. At this time the POP is operating on its redundant power feed and all services have been re-routed where possible. Ethernet services provided by this cab are considered “at risk” until power has been fully restored. Transit services will automatically re-route in the event of a failure.

Due to the nature of the works, engineers will be working within a live cabinet. No issues are expected and extreme care will be taken while the works are under-taken.

Further updates will be provided in the morning.

UPDATE01 – 10:10
Engineers have started work.

UPDATE02 – 10:55
The new power distribution hardware has been installed and engineers will begin to power up the affected hardware 1 device at a time. Level3 are on site with us in the evn of another problem.

UPDATE03 – 11:10
All hardware has been powered up and the faulty device found (all be it with a bang and fire) Unfortunately the failed hardware is the redundant power supply on the network core within this rack. Redundant hardware of this size is not kept on site and we are currently in the process of sourcing another unit. Further updates to follow.

UPDATE04 – 12:02
Further test have been done and confirmed the PSU unit has failed. Engineers have removed a power supply from another unit from our 4th floor suite and installed it within the 2nd floor POP to confirm it is un-damaged by the recent events. Further updates to follow.

UPDATE05 – 12:45
Engineers have ordered a same day replacement from one of our suppliers. Engineers are going to remain on site to fit and commission the new hardware on arrival

UPDATE06 – 15:25
Engineers remain on site awaiting hardware. ETA was 15:30, however this has been pushed back due to a crash on the A3.

UPDATE07 – 15:25
Enginners remain on site having little fun. We have been advised the part is now in London and will be with us by 17:30

UPDATE08 – 17.35
The replacement part has arrived on site

UPDATE09 – 17.50
Despite our best efforts, our supplier has shipped the wrong part! Discussions had with them have concluded with no further delivery options today. New hardware has been sourced and is being made avaliable to site for a timed delivery. Engineers are attending again in the morning to swap out the (hopefully) correct new part. We do applogise for the delay in getting this resolved, however want to remind customers who route via this device that it is still operating as expected on its redundant supply.

UPDATE10 – 09:30 – 07/04/2016
Engineers have retured to site and are awaiting delivery of the new PSU.

UPDATE11 – 09:46 – 07/04/2016
Delivery update to advise the hardware will be on site before 11am.

UPDATE12 – 10:33 – 07/04/2016
Hardware has arrived on site and engineers have confirmed it is the correct unit this time.

UPDATE13 – 10:47 – 07/04/2016
Engineers have installed the new power supply and confirmed its operation within the core. A series of load tests have been conducted with normal operation observed.

UPDATE14 – 11:00 – 07/04/2016
We are happy the new power supply is operating as expected, however will continue to monitor its operation for the next few hours. The site is no longer classed as “at risk” and this issue will now be closed off.
We apologise one again for the delay this has taken to resolve and will be reviewing our internal procedures on hardware spares of this nature at Goswell Road.

LON01 – EasyXDSL – 24/03/2016 – 14:30 – *Emergency Maintenance*

We have just been made aware that our carrier in conjunction with BT will be conducting some emergency maintenance on several FTTC gateways within the South East area. This will cause connections to drop, however should reconnect almost instantly.

We apologize for the short notice given for these works and will advise once complete.