Skip to main content

Network Hang on Big Dorm Fridge Machine

What Was Happening

The Intel I219-V NIC (e1000e driver) suffered a Hardware Unit Hang — the transmit queue froze and the driver
couldn't reset it:

  • First incident (Mar 04, ~08:39): Ran for 16+ days then hung
  • Today (Mar 07, ~07:00): Ran for 3 days then hung again

Once hung, it logs the error every 2 seconds but never recovers on its own. systemctl restart networking operates at the ifupdown/NM layer and has no ability to reset PCI hardware — that's why it didn't help.

What Was Installed

  1. /etc/modprobe.d/e1000e.conf — Disables SmartPowerDownEnable, the most common cause of the I219-V hang. Takes effect on next boot (the driver reloads with this parameter).
  2. /etc/udev/rules.d/81-e1000e-power.rules — Keeps the NIC's PCI power management in full-on mode so the OS never tries to power-save the device.
  3. /usr/local/sbin/e1000e-recover.sh + e1000e-recover.service — A recovery tool: if the hang occurs again before the fix is proven, you can run sudo systemctl start e1000e-recover instead of rebooting. It unloads/reloads the driver and brings br0 back up.

The SmartPowerDownEnable=0 change requires a reboot to take effect (the module is already loaded). The next reboot you do for any reason will activate it.