Device Registration & Service Interruption
Incident Report for MeloTel Network Operations (NOC)
Postmortem

Between the hour of 11:27am and 5:39pm on Thursday September 30th, 2021, MeloTel users encountered issues with specific Polycom model devices failing to register if they were rebooted. The issue escallated to major service inturruption for many customers. The issue was ultimately resolved by issuing a new TLS certificate provider. Total major impact was 2.5 hours.

Here is the timeline of events.

  • 11:27am - The event was detected by a customer who had relocated a telephone to another workstation and then failing to register their extensions afterward. The problem was reported to MeloTel Support and our team started working on the customer’s issue.
  • 11:32am - We received a second complaint from a customer reporting a similar problem.
  • 12:10pm – MeloTel support attempted to replicate the problem internally and confirmed a problem with re-registration.
  • 12:18pm – MeloTel announced on status.melotel.com that we were investigating troubles with devices re-registering. (At this time, the issue had only known to impact two devices, but we knew it would impact more if a customer’s phone was restarted)
  • 12:51pm – Our engineers believed they narrowed the issue to our provisioning platform and focused efforts there.
  • 2:22pm – After exhaustive troubleshooting our provisioning platform we involved our switch vendor engineers to investigate a possible Certificate issue on our PBX platform. (At this time the issue had only impacted approximately 15 customer phones)
  • 3:06pm – In an attempt to re-issue our TLS encryption certificate, our switch engineers restarted a service in our cloud which resulted in many of customers Polycom devices becoming unregistered. While calling on the network still worked. Without a device to register, it was impossible for many customers to make or receive calls from their Polycom phone.
  • 3:09pm – Since the incident was impacting more than 1,000 customers, the incident status was updated to critical status. Remediation measures were being offered to customers to re-route their services to cell phones and cloud softphone apps.
  • 3:44pm – Our engineers identified the issue was related to kamailio TLS registration service using Let’s Encrypt SSL IdentTrust DST Root CA X3 certificate.
  • 3:58pm – MeloTel engineers work in corporation with Polycom Engineering Support, our switch vendor, and a security expert to determine the only solution would be to issue a new security certificate from DigiCert, a Polycom confirmed trusted certified authority.
  • 4:46pm – We generated a new CSR and placed the order with DigiCert. Unfortunately, the activation is not instant and for security, MeloTel had to be independently verified by DigiCert before the certificate would be issued.
  • 5:28pm – The certificate was issued to MeloTel by DigiCert.
  • 5:38pm – The certificate was installed by our engineers and devices immediately began re-registering.
  • 5:38pm – All clear was given to customers. Service status closed.

IN CLOSING: While MeloTel had known about Let’s Crypt DST Root CA X3 Expiration this since May 7, 2021, we were assured that Polycom Firmware versions would survive this without failing. While it was true that many Polycom devices were not impacted by this, there were some Polycom VVX series devices which were impacted and ultimately put out of service.

We have learned from this. MeloTel stand committed to stop using Let’s Encrypt SSL Certificates for TLS device registration moving forward. Our new certificate issuer is DigiCert.

We sincerely apologize for the inconvenience that you had experienced due to this incident. We thank you for your patience and your support.

Posted Oct 01, 2021 - 10:11 EDT

Resolved
This incident has been resolved. All phones should have automatically assumed a registration. If your phone has not registered automatically, please reboot the device for it to re-register. We are sincerely sorry for this major inconvenience and understand how much an impact an event like this is on your business. Please rest assured that events like these are not taken lightly and we will do better by learning from this. An RFO will be provided after our complete assessment.
Posted Sep 30, 2021 - 17:38 EDT
Update
The root cause has been identified. This issue is related to an unexpect certificate incompatibility with Letscrypt and Polycom devices. Information can be found here. https://techcrunch.com/2021/09/21/lets-encrypt-root-expiry/ - We are currently working as quickly as possible to update our root certificate and restore services. ETR atleast 2 hours.
Posted Sep 30, 2021 - 16:32 EDT
Identified
Our enginbeers have confirmed the issue is related to our kamailio service which is responsible for device registrations. They are actively working on a resolution now and we hope to provide positive updates shortly. We can confirm that incoming calls are working. If you already have cloudsoftphone, your calls should continue to flow. If you wish to have the calls coming in to your extension routed to your cell phone number, please call 1-888-635-6835 option 3 and our agents will assist you.
Posted Sep 30, 2021 - 15:44 EDT
Update
Incident has been updated to Critical. This is now impacting most device registrations preventing calls from completing. Updates will become more frequent.
Posted Sep 30, 2021 - 15:09 EDT
Update
Our engineers have sent a command to renew a certificate in our system which caused devices to fail to register. We are working ont his now and will update shortly.
Posted Sep 30, 2021 - 15:06 EDT
Investigating
Regretfully we have not isolated the root cause of devices failing to re-register after a device reboot. We have involved our switch vendor engineers and are working on this with expediency. We sincerely apologize for any inconvenience
Posted Sep 30, 2021 - 14:21 EDT
Identified
We have narrowed the trouble with extensions registering to our provisioning platform. Our engineers are working diligently to restore services asap.
Posted Sep 30, 2021 - 12:50 EDT
Investigating
We are investigating an issue with phones that are rebooted not being able to get registration again. We are working on this ow and will provide updates as soon as we have them. in the meantime, please avoid unnecessary reboots of your Polycom phones.
Posted Sep 30, 2021 - 12:17 EDT
This incident affected: VoIP Services (Outgoing Calls, SIP Registration) and Click2Call.