In the world of microcontrollers (MCUs), sometimes things go wrong. If a program goes haywire or into an infinite loop, it needs a way to check and see if things are still running. In “the old days,” the Windows operating system would occasionally crash (experience a fatal error) and put up what was called the Blue Screen of Death (BSoD) where after it would reboot to prevent damage to the computer such as writing over vital boot code or similarly dangerous events. (The BSoD happens much less often these days.) Embedded systems are different from desktop computers, however, in that there is rarely going to be a human around who will know how to reboot the failing device.
Watchdog timers (WDTs), or watchdogs, are circuits external to the processor that can detect and trigger a processor reset (and/or another event) if necessary. The MCU checks in with the watchdog timer at a set interval to show that it’s still on the job. Like a bomb, the watchdog timer is set to count down and if it times out, it resets the MCU, dumping programs and rebooting the MCU and probably other areas in the system that work in tandem with the MCU. But as long as the MCU is running, it will continue to ping the watchdog to reset the timer. It’s best to keep a watchdog external and unreachable by MCU code. A watchdog can be an external component in a separate package from the integrated circuit (IC) that houses the MCU (best), or a watchdog can be found inside the IC but on a different circuit from the MCU, however a WDT that’s dependent on the same resources as the MCU might not be a good idea for obvious reasons.
Smartphones can be rebooted with a power-cycle off and on (usually when you need to make an urgent call), but how do you power down the MCU that runs the cruise control in your car? Turning off the car would definitely reboot the cruise control loop, but this option is obviously not safe at a high rate of speed. What if the cruise control program gets stuck in the acceleration mode? The MCU running the cruise control program may need a watchdog timer, which is a kind of external check. You can’t expect something at risk of possibly losing its mind to mind its own safety. MCUs don’t necessarily lose their mind that often, at least they aren’t supposed to, but in critical systems, you cannot trust that something won’t happen. It’s possible that a single cosmic ray can obliterate a single register in flash memory that causes a stack overflow and infinite loops. Once the MCU has gone off the rails, software checks are likely to be ignored, bypassed, overwritten, or just plain forgotten as a rogue event can wreak exquisite havoc.
“Watchdog” may seem a curious choice of words, but there’s a metaphor in there. (Maybe “time bomb” would have been a more accurate description, but it wouldn’t sound quite right to talk about bombs in your system, either.) The act of resetting the WDT has been referred to as “kicking the dog.” According to Niall Murphy (by way of Jack Ganssle), “If the man stops kicking the dog, the dog will take advantage of the hesitation and bite the man.” If you take too long to kick the dog, the dog bites you by resetting the MCU. Regardless, the watchdog timer has been around for decades and is still a reliable method to maintain an electronic fail-safe. WDTs are not necessarily the last line of defense, however. If a watchdog were to fail along with the MCU (perhaps the entire PCB burns out), mechanical hardware fail safes might be the next step for ensuring safety.