DSL DHCP Server Issues

Early this morning four of our DHCP servers started having issues responding quickly to DCHP requests.  These simultaneous failures overwhelmed our ability to migrate load to our hot-standby servers but we were able to put several work-arounds in place to mitigate the issues.  These failures were initially believed to be consistent with disk pre-failure scenarios where a single disk’s performance is impacted and the RAID system has yet to fail the disk.  However, upon further investigation it was revealed that these failures were triggered by scheduled SMART tests.  Ironically, the SMART tests were recently enabled to help us detect and replace failing disks before their failure triggered a service impacting event.

At this time, all DHCP services have been returned to normal.

-Nathan, Don, and Kelsey

2 comments for “DSL DHCP Server Issues

  1. I saw that irregularity this morning (up too d**n early), rebooted everything (twice), it didn’t go away, then said to myself, “Self, you’re in the capable hands of Sonic dot Net. They’ll have this working in no time!” And once again, like always, y’all did.

    Thanks, guys (and gals).

  2. It would nice to include a time line in the outage reports (i.e. service impact begin at AA:BB and ended at YY:ZZ).

Leave a Reply

Your email address will not be published. Required fields are marked *