We’ve all been there. You’re halfway through a high-stakes ranked match, or your AI model is finally converging after three hours of training, when suddenly—chaos. The screen freezes, artifacts flash across the monitor, or your frame rate drops to a slideshow.
First, try terminating the frozen processes:
sudo killall nvidia-smi If that fails, reload the kernel module:
Before you reinstall drivers, before you blame the game developer, and certainly before you buy a new graphics card:
sudo rmmod nvidia_drm nvidia_modeset nvidia_uvm nvidia sudo modprobe nvidia Warning: This will kill any active training runs. Save your checkpoints first. Restarting the GPU is a fix for transient errors (driver timeouts, thermal throttling recovery, blank screens after sleep). If you have to do this daily, you are treating a symptom, not the disease.
Here is everything you need to know about why this works and how to do it properly. Unlike your CPU, which is constantly juggling thousands of background tasks, your GPU runs on a "fifo" (first in, first out) system. But over time, a poorly coded game, an over-aggressive overclock, or a memory leak can leave the GPU in a "zombie" state.
We’ve all been there. You’re halfway through a high-stakes ranked match, or your AI model is finally converging after three hours of training, when suddenly—chaos. The screen freezes, artifacts flash across the monitor, or your frame rate drops to a slideshow.
First, try terminating the frozen processes:
sudo killall nvidia-smi If that fails, reload the kernel module:
Before you reinstall drivers, before you blame the game developer, and certainly before you buy a new graphics card:
sudo rmmod nvidia_drm nvidia_modeset nvidia_uvm nvidia sudo modprobe nvidia Warning: This will kill any active training runs. Save your checkpoints first. Restarting the GPU is a fix for transient errors (driver timeouts, thermal throttling recovery, blank screens after sleep). If you have to do this daily, you are treating a symptom, not the disease.
Here is everything you need to know about why this works and how to do it properly. Unlike your CPU, which is constantly juggling thousands of background tasks, your GPU runs on a "fifo" (first in, first out) system. But over time, a poorly coded game, an over-aggressive overclock, or a memory leak can leave the GPU in a "zombie" state.