Fixing graphics in Pop!_OS with Kernelstub
Pop!_OS is great. I’ve been using it for a couple years now and the overall stability is really impressive. I think I first started using it around 18.04 with a System 76 Gazelle laptop. The combination of Linux friendly hardware and an OS made for that hardware was brilliant. My previous laptop was from HP and I had endless graphics issues, I actually had to grab an upstream kernel and build it myself to workaround the constant graphics lockups. That was not a great experience.
Unfortunately the System 76 Gazelle enclosure was not made to last and all the hinges eventually gave up. Every time I closed the laptop I felt like I was defusing a bomb. One false move and 💥.
Fast forward to my current laptop - Lenovo ThinkPad L15 Gen 1
. It’s a really decent laptop, I would say that overall I am very impressed.
However, there are a few major quirks. There seems to be some graphics instability that I have not been able to get to the bottom of. I would experience a full graphics lockup at least once a week, usually more. At first I thought MS Teams was the culprit (can you blame me for that?) but it just seems to be a general instability. What makes it worse is that the issue appears or disappears with Kernel updates. After one kernel update I went for weeks with no lockups, and then the next release broke things completely.
What is interesting about the issues though, is that the lockups are not necessarily related to intense graphics rendering or GPU usage. After one of the lockups and I performed a forced shutdown - power key down for 10 seconds - then powered up and went to grab a coffee. Since the main drive is encrypted Pop!_OS was stopped at the decrypt drive screen. It actually crashed/froze at that screen. I thought that was a bit weird because at that point hardly anything is happening, you have loaded the kernel but have not booted so only minimal drivers/services should be running.
So how did I fix it…
kernelstub⌗
If you’ve switched from Ubuntu to Pop!_OS you may have noticed that there are a lot of changes. With 18.04 it was pretty similar but by 21.10 and now 22.04, it’s very different. The major difference here is that there is no grub loader. System 76 switched to kernelstub and in some ways I think it is a lot easier to use when compared to grub.
Fixing my issue was a lot of trial and error. After A LOT of googling I eventually came across an Arch Linux page with some boot flags that should fix a Crash/freeze on low power Intel CPUs. You can read all the details on the link but the general idea is that when some Intel CPUs go into a low power state they can crash or freeze. It appears that a few Lenovo laptops have this issue.
It seems odd since my issue does not occur when idle, it usually happens in the middle of something important, so it did not seem like an issue with the CPU powering down. Nevertheless, I tried out the suggested changes and it fixed my issues.
My kernelstub
configuration is as follows:
❯ sudo kernelstub -p
kernelstub.Config : INFO Looking for configuration...
kernelstub : INFO System information:
OS:..................Pop!_OS 22.04
Root partition:....../dev/dm-1
Root FS UUID:........daf480c2-3108-4db3-a349-c4a8f23c9fc2
ESP Path:............/boot/efi
ESP Partition:......./dev/nvme0n1p1
ESP Partition #:.....1
NVRAM entry #:.......-1
Boot Variable #:.....0000
Kernel Boot Options:.splash intel_idle.max_cstate=2 systemd.show_status=false loglevel=0 quiet ahci.mobile_lpm_policy=1 pcie_aspm=off
Kernel Image Path:.../boot/vmlinuz-6.0.6-76060006-generic
Initrd Image Path:.../boot/initrd.img-6.0.6-76060006-generic
Force-overwrite:.....False
kernelstub : INFO Configuration details:
ESP Location:................../boot/efi
Management Mode:...............True
Install Loader configuration:..True
Configuration version:.........3
I have set
intel_idle.max_cstate=2
- settingcstate
to 1 just did not workahci.mobile_lpm_policy=1
- I can’t tell for sure that this is required but it has not broken anything
I also set an additional parameter pcie_aspm=off
. When I first started trying to figure out the root cause there were a lot of SATA errors in the dmesg
logs. They seemed to coincide with the lockups. Setting this cleaned up the logs but it was just a red herring. Having a cleaner dmesg
is never a bad thing though, tons of warnings about SATA corruption just looks scary.
The downside to all of this is that battery life takes a big knock. Fortunately I hardly use my laptop as a laptop so I have not noticed. I’ve been running with these parameters set for months now and have not experienced a single graphics lockup. At some point I will disable them to see if they are still required but for now… if it works, don’t touch it.