Author: Kelsey Cummings

T-Mobile Filtering SMS Messages from Support

It’s come to our attention that T-Mobile is filtering our SMS messages used to chat with support, send notifications, OTP and password recovery.  We’re working with our upstream SMS provider and hope to have this resolved soon.

Update: We’ve migrated OTP and password recovery to a different number to work around the block for these services.

-Sonic Operations

New Recursive DNS Server IPs

208.201.224.11 and 208.201.224.33, carved out of one of our first IP allocations, have been Sonic’s official recursive DNS servers for as long as 20 years.  Unfortunately, for various reasons, we need must deprecate this legacy IP assignment and have added 50.0.1.1 and 50.0.2.2 to our recursive DNS clusters to take their place.  If you have statically configured DNS servers feel free to update them at any time.  We will migrate server assigned DNS to the new IPs at our convenience.  The legacy IPs will continue to work indefinitely.   – System Operations

Cisco VPN Services Shutdown 8/31

We’re shutting down our legacy Cisco VPN services at the end of the August, 2020.  It has been replaced with our OpenVPN server which is also free for all Sonic customers to use.  OpenVPN is a modern VPN service with widespread support available in all current operating systems, mobile platforms and even many routers and firewalls.

To use the new VPN, simply log into https://ovpn.sonic.net using your Sonic username and password then download and install the appropriate client.  Once complete you can either download and import the connection profile manually or log in directly to https://ovpn.sonic.net

The few users still connecting to the old service have been notified directly.

-Kelsey

Inbound Mail Delays

We’re experiencing significantly above average inbound email flows at this time which has exceeded our spam filtering servers capacity.  These messages are queued locally pending available resources which may add a couple of minutes delay where delivery usually takes less than a second.  This is also resulting in a brief delay while sending outbound messages.  We’re bringing up additional resources now and expect to have the issue resolved soon.

Update: Additional resources have been deployed and this is no longer in issue.

-System Operations

IMAP Server Issues post Update/Rollback

Update 22:56:37 PDT: Everything looks good and all services appear to be functioning.

Update 21:37:03 PDT: The final cohort has been enabled.  We’ll continue to watch the cluster for next several hours to ensure that everything is working as intended.  It will still take some time before we’ve regained expected performance.

Update 19:17:35 PDT: Approximately 5/6 of user’s access to mail spools has been restored.  We’re still on target for having services fully restored in a few hours.

Update 17:39:57 PDT: Approximately 2/3 of user’s access to mail spools has been restored and while users can expect slow performance, the cluster and storage pools appear to be stable and chugging along.  At this point, we expect to have services fully restored later tonight.  Please note that no email was lost during this outage and our inbound servers were able to deliver mail without issue.

Update 16:19:00 PDT: The status of our email system is continuing to improve – we will post an update as our systems return to normal operation.

Update 13:00:56 PDT: In order to allow the cluster to recover we’ve taken steps to reduce the total number of user who can log in at once.  This appears to be having positive results and is allowing the storage pools to catch up with the load.  We still don’t have an ETA but the situation is improving.

Update: We believe the underlying issues have been been resolved.  However, it will take some time for IMAP and POP services to return to normal.  The fix has resulted in substantially increased IO load on our mail spool storage as all mailbox indexes have to be rebuilt.  This rebuild process can take a long time to complete, especially for large mailboxes, and it will be some time before the process is complete for all active mailboxes.

Update: We’re continuing to restore services for effected users.  Please note this may also affect some Fusion DSL user’s voicemail services.

Update: mailboxes for many users have been fixed and are working properly.  We’re continuing to repair remaining effected users.

Unfortunately, the mail server upgrade and rollback has caused issues for our IMAP cluster and it looks like most IMAP sessions are failing.  We’re working to restore services ASAP and will have more information forthcoming.  -Kelsey and Grant.

 

Santa Rosa Datacenter UPS Preventative Maintenance

Our Santa Rosa Datacenter UPS’ will be undergoing their annual preventative maintenance on Thursday May 10th, starting at 10:00 AM.  This is a fully scripted event with our vendor and should not result in any service interruptions.

Update:  This maintenance was postponed and rescheduled to June 6th starting at 10AM.

– Sonic Facilities and Systems

POP/IMAP SSL Cert Expiration!

UPDATE: Certificates deployed.

Do to unexpected issues with the IMAP cluster upgrades we neglected to deploy the new certificates to the old servers,.  Unfortunately the certs on the old servers just expired.  The new certs are being deployed now.  We are sorry for any inconvenience or concern this may have caused you.

-Sonic Operations

Fusion Voicemail and Membertools Outage

Early this morning a library used by many of our backend rpc services was updated as part of our nightly automatic upgrade process.  This particular update caused an issue that required services which depended upon it to be restarted in order for them to function correctly under some specific circumstances.  Unfortunately most of those specific circumstances are not tested for by our monitoring suite and so we didn’t discover the issue until this morning.

During this period we were unable to process some payments and the Fusion Voicemail system wasn’t able to process inbound calls correctly.  All services have been restored now and we’re continuing to dig into the underlying cause of the bug and what can be done to prevent it from happening again.

-Kelsey and the rest of System Operations

Santa Rosa Data Center UPS Maintenance

Update: Our vendor has had to reschedule yet again, this time for May 16th and 10AM.

Update: Maintenance has been rescheduled again, this time for May 1st.

Update: Maintenance has been rescheduled for April 27th.

Update: This maintenance has been cancelled and will be rescheduled at a later date.

Tomorrow, starting at 10:00, the UPSes that service our Santa Rosa data center will be undergoing scheduled preventative maintenance.  This includes overhauling one of them with a complete capacitor replacement.  The maintenance is fully scripted with our vendor and no interruption of power services to the datacenter are expected.  -Sonic Operations and Facilities