Author: Kelsey Cummings

DKIM Signing Outbound Email

We began signing outbound mail flows for sonic.net and list.sonic.net several weeks ago and have started to sign all customer owned domains this morning. This should be seamless for most customers, however, in some cases you may receive special instructions from us in order to enable it for your domains. Having aligned DKIM and SPF has become increasingly important to ensure reliable email delivery, especially into large providers like Gmail, Yahoo and Microsoft. We have chosen not to add DMARC records to customer domains at this time.

Please note, that if you use a third-party service provider such as Mailchimp, Constant Contact, Salesforce or Shopify to send email from either your sonic.net or hosted domain you will most likely need to take action to ensure this email will continue to be delivered as expected. Importantly, you will not be able to send mail through a third-party service from a “@sonic.net” address. (These messages can, of course, have a reply-to to your “@sonic.net” address.) If you do use a third-party service, you will need to work with their customer service to ensure your mail flows will not be affected.

Please direct any questions or comments to https://forums.sonic.net

Santa Rosa Data Center and Colo UPS Replacement

We are scheduled and on track to replace one of the UPSes that supplies power to our Santa Rosa Data Center and Colo facility at 9:00 AM on 5/26. The UPS has reached its end of life and is being replaced with a new highly efficient modular UPS. The cut over from the old UPS to the new UPS is fully scripted and coordinated with our vendors and no interruption of power services to the facility should occur. This process will be repeated to replace the other end of life UPS with a similar replacement UPS in the coming weeks.

Update: The first UPS replacement went off smoothly without a hitch. Replacement of the second UPS will begin on June 6th and should be completed on June 7th.
Update: The second UPS replacement has been delayed and load testing and commissioning will begin on June 8th.

-Sonic System Operations and Facilities

T-Mobile Filtering SMS Messages from Support

It’s come to our attention that T-Mobile is filtering our SMS messages used to chat with support, send notifications, OTP and password recovery.  We’re working with our upstream SMS provider and hope to have this resolved soon.

Update: We’ve migrated OTP and password recovery to a different number to work around the block for these services.

-Sonic Operations

New Recursive DNS Server IPs

208.201.224.11 and 208.201.224.33, carved out of one of our first IP allocations, have been Sonic’s official recursive DNS servers for as long as 20 years.  Unfortunately, for various reasons, we need must deprecate this legacy IP assignment and have added 50.0.1.1 and 50.0.2.2 to our recursive DNS clusters to take their place.  If you have statically configured DNS servers feel free to update them at any time.  We will migrate server assigned DNS to the new IPs at our convenience.  The legacy IPs will continue to work indefinitely.   – System Operations

Cisco VPN Services Shutdown 8/31

We’re shutting down our legacy Cisco VPN services at the end of the August, 2020.  It has been replaced with our OpenVPN server which is also free for all Sonic customers to use.  OpenVPN is a modern VPN service with widespread support available in all current operating systems, mobile platforms and even many routers and firewalls.

To use the new VPN, simply log into https://ovpn.sonic.net using your Sonic username and password then download and install the appropriate client.  Once complete you can either download and import the connection profile manually or log in directly to https://ovpn.sonic.net

The few users still connecting to the old service have been notified directly.

-Kelsey

Inbound Mail Delays

We’re experiencing significantly above average inbound email flows at this time which has exceeded our spam filtering servers capacity.  These messages are queued locally pending available resources which may add a couple of minutes delay where delivery usually takes less than a second.  This is also resulting in a brief delay while sending outbound messages.  We’re bringing up additional resources now and expect to have the issue resolved soon.

Update: Additional resources have been deployed and this is no longer in issue.

-System Operations

IMAP Server Issues post Update/Rollback

Update 22:56:37 PDT: Everything looks good and all services appear to be functioning.

Update 21:37:03 PDT: The final cohort has been enabled.  We’ll continue to watch the cluster for next several hours to ensure that everything is working as intended.  It will still take some time before we’ve regained expected performance.

Update 19:17:35 PDT: Approximately 5/6 of user’s access to mail spools has been restored.  We’re still on target for having services fully restored in a few hours.

Update 17:39:57 PDT: Approximately 2/3 of user’s access to mail spools has been restored and while users can expect slow performance, the cluster and storage pools appear to be stable and chugging along.  At this point, we expect to have services fully restored later tonight.  Please note that no email was lost during this outage and our inbound servers were able to deliver mail without issue.

Update 16:19:00 PDT: The status of our email system is continuing to improve – we will post an update as our systems return to normal operation.

Update 13:00:56 PDT: In order to allow the cluster to recover we’ve taken steps to reduce the total number of user who can log in at once.  This appears to be having positive results and is allowing the storage pools to catch up with the load.  We still don’t have an ETA but the situation is improving.

Update: We believe the underlying issues have been been resolved.  However, it will take some time for IMAP and POP services to return to normal.  The fix has resulted in substantially increased IO load on our mail spool storage as all mailbox indexes have to be rebuilt.  This rebuild process can take a long time to complete, especially for large mailboxes, and it will be some time before the process is complete for all active mailboxes.

Update: We’re continuing to restore services for effected users.  Please note this may also affect some Fusion DSL user’s voicemail services.

Update: mailboxes for many users have been fixed and are working properly.  We’re continuing to repair remaining effected users.

Unfortunately, the mail server upgrade and rollback has caused issues for our IMAP cluster and it looks like most IMAP sessions are failing.  We’re working to restore services ASAP and will have more information forthcoming.  -Kelsey and Grant.

 

Santa Rosa Datacenter UPS Preventative Maintenance

Our Santa Rosa Datacenter UPS’ will be undergoing their annual preventative maintenance on Thursday May 10th, starting at 10:00 AM.  This is a fully scripted event with our vendor and should not result in any service interruptions.

Update:  This maintenance was postponed and rescheduled to June 6th starting at 10AM.

– Sonic Facilities and Systems

POP/IMAP SSL Cert Expiration!

UPDATE: Certificates deployed.

Do to unexpected issues with the IMAP cluster upgrades we neglected to deploy the new certificates to the old servers,.  Unfortunately the certs on the old servers just expired.  The new certs are being deployed now.  We are sorry for any inconvenience or concern this may have caused you.

-Sonic Operations

Fusion Voicemail and Membertools Outage

Early this morning a library used by many of our backend rpc services was updated as part of our nightly automatic upgrade process.  This particular update caused an issue that required services which depended upon it to be restarted in order for them to function correctly under some specific circumstances.  Unfortunately most of those specific circumstances are not tested for by our monitoring suite and so we didn’t discover the issue until this morning.

During this period we were unable to process some payments and the Fusion Voicemail system wasn’t able to process inbound calls correctly.  All services have been restored now and we’re continuing to dig into the underlying cause of the bug and what can be done to prevent it from happening again.

-Kelsey and the rest of System Operations