Monthly Archives: August 2016

LON01 – CORE03 – 24/08/2016 – 20:00 Till 23:59 *Emergency Work *

Following on from recent repeated hardware failures on core03.structuredcommunications.co.uk as detailed HERE the decision has been taken to fully replace the device. We will also be taking the opportunity to upgrade the IOS image to bring this device in line with the current images across the rest of our network.

This work will involve powering down and physically moving the current device along with all installed line cards. Due to this all directly connected services (listed below) will be unavailable for the duration of the works.

> Bonded DSL on AG1 & AG2

> un-managed SIP trunking provided via sipwise.easyipt.co.uk

> Managed VoIP services provided via primary-sw.r03.core03 & primary-sw.r04.core03

> Webhosting via server01.easyhttp.co.uk

> VPS sessions on esxi10.r02.structuredcommunications.co.uk

Other services will remain unaffected. Redundant services provided via other parts of the network (Such as DNS & SMTP) will take over. Please ensure you configuration is up to date.

UPDATE01 – 20:10 – 24/08/2016 Engineers are on site and these works have started.

UPDATE02 – 22:06 – 24/08/2016 Engineers have completed the above works ahead of schedule and we can confirm all services have returned to normal. We apologize for the inconvenience caused.

We will continue to monitor the new device to ensure continued operation.

LON01 – CORE03 – 21/08/2016 – 08:33 *At Risk*

Our network monitoring has alerted us to a fault on CORE03 within our Goswell Road network. This fault is a re-occurrence of an issue identified yesterday that was resolved without impact. Additional logging was added at the time to further assist should it be required.

The issue has been tracked down to the “Ethernet Out of Band Channel” (EOBC) control channel on the devices back plane.

Due to the number of line cards automatically taken out of service by device, we are currently investigating to see if this is part of a common hardware fault such as the current active supervisor module.

We diversely route our internal backhull fibre up-links across each core to insure that a single line card failure does not result in an outage. This is currently in operation however we have lost a number of links due to the fault and the device is classed as at risk along with any directly connected equipment.

We are currently reviewing the logs and will update with further information / action plan asap.

UPDATE01 – 09:49
After reviewing the logs we have concluded the next action step is to swap between active supervisors within that device. This will cause a brief outage to all services connected to that device. We will monitor the device closely after the change to see if the same issue occurs. This reload has been scheduled for 10:00 today.

UPDATE02 – 10:08
The swap completed as expected, however despite this supervisor showing OK and passing diagnostics, it failed to fully take the system load and was reverted back. We suspect this is now a possible backplane issue on this device. Further updates to follow.

UPDATE03 – 14:08
Further observations have been made and the log files reviewed at depth. At this stage we can advise the backup supervisor within CORE03 has been reporting errors however the Cisco IOS listed these as “Non-fatal” and as such have not been flagged up within our monitoring platform.

We suspect a fault had occurred on the standby supervisor which had not been picked up on by the devices internal diagnostics until we bought the card fully in to operation. This fault we suspect was having an impact on the EOBC reporting and thus causing line cards to be disabled. As the previous fault took 24 hours to resurface we are continuing to monitor. An emergency maintenance window is also going to be scheduled for CORE03 to replace the suspected failed card, along with an IOS update.

UPDATE04 – 14:30 – 22/08/2016
Despite seeing the device operate for over 24 hours without further errors, we have just observed the fault conditions triggering line cards to be disabled. We therefore now suspect this a problem with the chassis its-self and the backplane. We will now be replacing the entire device as a matter of course to prevent this escalating to an outage. Further works will be scheduled and notified via the NOC as we dont have a pre-built device on site.

EasyIPT – 0207 Numbers – 08/08/2016 – 16:56

We are aware of some intermittent inbound call issues on 0207 numbers from a single carrier.

We have raised a support query with this carrier and are awaiting an update from them. At the moment we are not aware of this affecting outbound calls, nor are we aware of any other affected inbound area codes. If you are having issues we would advise you to contact us so your examples can be logged to further assist with a quicker resolution.