Following on from the previous NOC advisory it has become necessary to migrate the existing “Active” supervisor to “Hot Standby” so a reload can take place on that slot. This may effect backplane switching on CORE-01 and packet loss may be seen for several seconds while the Standby supervisor becomes the Active unit.
During the emergency reboot of ESXI02.R01 following our network monitoring advising the management interface was showing as off-line. We discovered an issue with the ARP timeout settings on LON01-CORE core that could have caused extended downtime in the event that the EasyBond service needed to flip between aggregators.
We have corrected this configuration issue and completed a fail-over test which completed as expected.
Users on EasyBond would have seen 2 x 30 seconds outages. We apologise for any inconvenience caused.
Following on from a known bug in Asterisks whereby a PBX sends an occasional “unauthorized registration” message to our softswitch following on from a SIP OPTIONS request resulting in an account ban if received more than 3 times during the life of the registration.
We have made some changes to our core softswitch to eliminate this problem. These changes require a reload of the active configuration files on the server and will drop any active call. This reload should take no more than 30 seconds.
We apologise for any inconvenience this may cause.
UPDATE01 – 21:01
This work is complete.
We will be conducting a controlled failover test to our backup aggregation server following today’s failure to ensure this transfer process is working correctly. A few small outages will be seen during this window for bonded customers.
UPDATE01 – 20:54
Testing has completed and the service failed over as expected on 3 simulated failures
Following on from today’s CPU issues, We have been advised there is a major upgrade for Mail Enable. This will be installed tonight to ensure we are running the latest release.#
UPDATE 01 – 20:15
This work is now complete. Any users having problems with account syncing are advised to remove and re-add there account to their mail client. (Remember to back up your messages first)
We are aware CPU usage on EasyHTTP is starting to climb to 100% across all 16 cores. We are monitoring the process that is causing the high usage with the view of restarting the server should usage not drop. A restart of the service has not resulted in a fix.
UPDATE 01 – 08:53
The IMAP service has continued to consume CPU usage and levels are now above 90%. Looking at the service threads we are unable to locate any sub service or string that would be causing the high CPU usage. We have therefore opted to restart EasyHTTP which will take around 10 minutes due to the size and configuration of the RAID array.
UPDATE 02 – 09:07
Due to a disk check being requested by the server due to uptime, we expect a further 15 minute delay.
UPDATE 03 – 09:30
The server has now rebooted and all services have been restored however the IMAP service is still using large amounts of the CPUs. We have taken action move the service to a single core so we can continue to fault find. Users may experience a slower email service due to the limits enforce.
UPDATE 04 -10:15
We have discovered a user account with over 700,000 emails in there deleted items which are suspected to be causing the high CPU load.
UPDATE 05 – 11:03
These files have been removed and the IMAP service restarted with all cores enabled. CPU usage is at normal levels. We will continue to monitor of the next few hours.
UPDATE 06 – 12:05
Our network monitor has alerted us that CPU levels have started to climb again. Further review of the IMAP process has high-listed another account of large size however items in stored in the users INBOX which we are unable to delete. We have therefore disabled the account and levels have returned back to normal.
We have had reports of performance issues on EasyHTTP.
Engineers are looking at the issue. Services may be restarted without warning after 20:00
UPDATE01 – 18:30
The performance issues seem isolated to MySQL driven websites.
UPDATE02 – 23:15
Engineers have installed Microsoft Wincache 22.214.171.124.5 and enabled the extension for PHP5.4
Various updates to the my.ini files have been made and a server reload has been completed.
Website loading times have seen a 100% improvement and are around 2.5seconds, which is in line with various Windows based platforms.
We are aware ns02.r01.lon.adapt.easybond.co.uk is not processing DNS lookup requests. We are currently looking in to the issue and will provide an update shortly. NS01 (ns01.r01.lon.adapt.easybond.co.uk) is operating normally so this should not be service effecting.
UPDATE 01 – 21:45
This issue has been resolved by re-loading the IPTABLE rules.
Following on from our planned maintenance works on EasyHTTP we removed the SSL certificate securing the control panel.
We have been unable to reinstall this. We have a backup for both IIS and Plesk. We have logged an issue with our software vendor to see if these can be re-installed. If not then we will simply install a new one.
UPDATE01 – 15:15 – 24/12/2013
A new SSL certificate has been ordered and will be installed shortly.
UPDATE02 – 16:00 – 24/12/2013
The new SSL certificate has just been issed by AlphaSSL and installed on the system. This Notice is closed
Our software vendor for MailEnable (who provide our SMTP server software) has alerted us to a vulnerability in several releases of their software. We will take immediate action and install these patches.
As a result this will stop email flow for around 5 minutes while we take the service off line to perform the upgrade.
This is for your own protection and we will advise once complete. We apologise for any inconvenience this may cause.
This work is now complete and the platform version has been updated to 7.07.