Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Chat about Linux in general
User avatar
catweazel
Level 19
Level 19
Posts: 9229
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by catweazel » Sat Aug 17, 2019 9:02 pm

I'm writing this for two reasons. I want to tell you a story (for a reason), and forums.linuxmint.com are top rated pages in go-ogle for generic linux searches, so hopefully this may also assist someone in the same boat.

Since February this year I have been battling to get kernel 5.0.x to run on my bleeding edge kit, all to no avail. My machine would hang on both reboot and shutdown, requiring a hard, cold boot.

I tried everything under the sun, from every power management kernel option, e.g. reboot=acpi, reboot=efi, acpi=force, and more. I've installed in CSM mode and reinstalled in UEFI mode, ripped out hardware, updated firmware, started with BIOS defaults, booted with and without fast boot, tried every nVidia driver under the sun, blacklisted hardware all over the place, trawled through kernel release notes until I was almost blind, and regularly covered myself in sackcloth and ashes and gnashed my teeth to stubs. Fortunately I'm quite bald so I don't have any head hair to rip out, though my once once hairy chest is now a patchwork quilt of almost total deforestation.

I distro-hopped left, right and centre. Only Manjaro would run on kernel 5. Not a single Ubuntu based distro would allow me to reboot or shutdown. I contemplated staying with Manjaro but it has some very rough edges that turned me off it, so I kept searching for a solution. The frustration experienced was quite intense. I caught myself at least twice in the last six months contemplating using Windwoes because I absolutely must have working gear.

Early last week, my backup server chucked a wobbly. Its 12TB hardware RAID 5EE set threw out a drive. I keep brand new spare drives around so I pulled out the dicky drive, checked it thoroughly and concluded it must have been an interstellar cosmic particle that knocked an atom out of orbit in the drive; the drive is only 1 year old and came up perfectly clean. I put the drive back in to the RAID set, the RAID controller manager claimed the set was optimal, no rebuild required. One hour later, the entire RAID set came crashing down. My backup RAID set had died completely.

I replaced the drive with a new one, and added a global hot spare for good measure. For RAID 5EE this is double redundancy because RAID 5EE also incorporates a spare into the RAID set itself. I stared into the face of several days of days of data transfer backing up my 12TB hardware RAID 5EE workstation over a gigabit connection. At this point I decided I'd got so far that it was time to try and install the latest and greatest KDE neon on the server. I went through my well-documented procedures and installed KDE neon based on Ubuntu 18.04.3, knowing it had kernel 5.x. Lo and behold, it would reboot and shutdown without question.

So what gives?

The server has an AMD 2700X CPU, and the workstation has an Intel i9-9900K. I rebooted the workstation and began reading and checking every single setting in its BIOS. There had to be a hardware reason that the AMD would reboot and shutdown with kernel 5.0 but the Intel wouldn't. I looked at the screen, gobsmacked... I was disappointed with myself because whenever I update the machine's BIOS I make it a habit to turn off the CPU C-states. I hadn't done that with the last BIOS update. I promptly turned off every CPU C-state, installed KDE neon with kernel 5.0 and rebooted and shutdown the Intel workstation with complete and utter impunity.

I was happier than a pig wallowing in wet muck.

Image

Since I disable all C-states by default and they're now all disabled, and the machine is now running kernel 5.0, and rebooting and shutting down exactly as it should, I'm not really interested in determining what one or combination of C-states is causing the problem. It's one or more of them ^^^^^ up there.

So, to cut a short story long, for those of you who provide your free time supporting LM, if you ever get frustrated over a newcomer's lack of knowledge or ability to express what the problem is, spare them a thought. Not everyone is gifted with the necessary turn of mind to do what you do. Keep up the great work, LM support community :)

EDIT: Important

I've seen a few posts referencing this issue but unfortunately getting wires crossed. The problem was with the Intel CPU. The AMD CPU always ran fine. The c-states and hardware prefetcher had to be disabled on the Intel CPU, not the AMD CPU. It was kernel 5.0.x running fine on the AMD that led me to solve the problem on the Intel.

Anyway, thanks for to all who have read this. It seems to have provided some vital information :)

Cheer, everyone.
Last edited by catweazel on Mon Aug 19, 2019 10:44 pm, edited 1 time in total.
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

User avatar
michael louwe
Level 10
Level 10
Posts: 3297
Joined: Sun Sep 11, 2016 11:18 pm

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by michael louwe » Sat Aug 17, 2019 10:02 pm

catweazel wrote:..
.
Thank you for your post.
....... So, you were also MIA on this forum for about 2 months cracking your head over this "PEBKAC" or Linux-specific issue, ie besides winter-vacationing in Indonesia.

It is quite common knowledge that advanced Power Management features are poorly supported in desktop Linux by the proprietary/closed-source-software-minded CPU OEMs(especially for laptops) and those who buy bleeding-edge hardware should not run the LTS versions of desktop Linux, which do not come with the latest stable Linux kernel(= should run Rolling Release versions like Archlinux/Manjaro, Fedora, Debian Unstable/Testing and Ubuntu HWE/19.04).

= Linux Mint users should not buy bleeding-edge or the newest hardware unless they are tech-geeks.

= those who want to have these advanced Power Management features in their CPUs should run Windows, eg longer battery life in laptops. In my case, I run my dual-booted laptop mostly off AC power.

Good day.

User avatar
trytip
Level 11
Level 11
Posts: 3588
Joined: Tue Jul 05, 2016 1:20 pm

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by trytip » Sat Aug 17, 2019 11:08 pm

catweazel wrote:
Sat Aug 17, 2019 9:02 pm
Fortunately I'm quite bald so I don't have any head hair to rip out, though my once once hairy chest is now a patchwork quilt of almost total deforestation.
catweazel is a liar. just look at all that hair
Image

i been trying to get my ryzen3/vega8 working properly since march earlier this year. what i found out is not much support except for the default current kernel linux-generic-4.15. been living with the open source drivers and the oibaf ppa which served well, but like most of us edgers iwant better and faster. killed my system quite many times trying to mix an hwe kernel stack thinking hardware acceleration kernel would give me better performance
as of now i settled for the new amd drivers 19.20 which seem to work with kernel 4.15 only. they say a few more kernels are capable, but not one that has any real support for new hardware. with these drivers it's odd that now my cursor dims darker with redshift controlling the brightness. i didn't have that with the opensource drivers. Yei, my cursor now obeys redshift formula
Image

User avatar
catweazel
Level 19
Level 19
Posts: 9229
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by catweazel » Sun Aug 18, 2019 12:52 am

trytip wrote:
Sat Aug 17, 2019 11:08 pm
catweazel wrote:
Sat Aug 17, 2019 9:02 pm
Fortunately I'm quite bald so I don't have any head hair to rip out, though my once once hairy chest is now a patchwork quilt of almost total deforestation.
catweazel is a liar. just look at all that hair
Image
lol
trytip wrote: i been trying to get my ryzen3/vega8 working properly since march earlier this year. what i found out is not much support except for the default current kernel linux-generic-4.15. been living with the open source drivers and the oibaf ppa which served well, but like most of us edgers iwant better and faster. killed my system quite many times trying to mix an hwe kernel stack thinking hardware acceleration kernel would give me better performance
as of now i settled for the new amd drivers 19.20 which seem to work with kernel 4.15 only. they say a few more kernels are capable, but not one that has any real support for new hardware. with these drivers it's odd that now my cursor dims darker with redshift controlling the brightness. i didn't have that with the opensource drivers. Yei, my cursor now obeys redshift formula
Well, I guess our experiences are standard fare for living on the hardware edge :)

Still, it was great learning experience and the joy at finally solving my issue was worth the trouble.I would never have suspected the C-states in my case.
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

User avatar
MurphCID
Level 5
Level 5
Posts: 917
Joined: Fri Sep 25, 2015 10:29 pm

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by MurphCID » Mon Aug 19, 2019 6:39 am

Please pardon my ignorance, but what are "C" states and why are they important to using Linux on an AMD system? I was running (albeit a short time) my AMD 1700 on an X370 mobo for a short time with no issues. So now I am worried that if I build a Ryzen system it will bork itself if I run Linux.

User avatar
Pjotr
Level 21
Level 21
Posts: 13248
Joined: Mon Mar 07, 2011 10:18 am
Location: The Netherlands (Holland)
Contact:

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by Pjotr » Mon Aug 19, 2019 7:07 am

MurphCID wrote:
Mon Aug 19, 2019 6:39 am
Please pardon my ignorance, but what are "C" states and why are they important to using Linux on an AMD system? I was running (albeit a short time) my AMD 1700 on an X370 mobo for a short time with no issues. So now I am worried that if I build a Ryzen system it will bork itself if I run Linux.
It's this:
https://easylinuxtipsproject.blogspot.c ... .html#ID25
(item 25)

First time I've heard of them causing trouble on an AMD CPU.... But then, Catweazle always works magic. :lol:

Note that with an AMD Ryzen CPU you might benefit from using the 5.2.x kernel, which is already available in the Canonical Kernel Team PPA:
https://easylinuxtipsproject.blogspot.c ... t.html#ID8
(item 8 )
Tip: 10 things to do after installing Linux Mint 19.2 Tina
Keep your Linux Mint healthy: Avoid these 10 fatal mistakes
Twitter: twitter.com/easylinuxtips
All in all, horse sense simply makes sense.

JeremyB
Level 20
Level 20
Posts: 10890
Joined: Fri Feb 21, 2014 8:17 am

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by JeremyB » Mon Aug 19, 2019 7:37 am

Any chance the problem machine has Ethernet using the r8169 kernel module?

User avatar
catweazel
Level 19
Level 19
Posts: 9229
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by catweazel » Mon Aug 19, 2019 7:43 am

MurphCID wrote:
Mon Aug 19, 2019 6:39 am
Please pardon my ignorance, but what are "C" states and why are they important to using Linux on an AMD system? I was running (albeit a short time) my AMD 1700 on an X370 mobo for a short time with no issues. So now I am worried that if I build a Ryzen system it will bork itself if I run Linux.
The Ryzen was and is working flawlessly. It was the Intel that was giving trouble.

Cheers.
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

User avatar
catweazel
Level 19
Level 19
Posts: 9229
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by catweazel » Mon Aug 19, 2019 7:46 am

JeremyB wrote:
Mon Aug 19, 2019 7:37 am
Any chance the problem machine has Ethernet using the r8169 kernel module?
Indeed...

Code: Select all

e1000e                245760  0
r8169                  81920  0
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

User avatar
trytip
Level 11
Level 11
Posts: 3588
Joined: Tue Jul 05, 2016 1:20 pm

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by trytip » Mon Aug 19, 2019 10:06 am

Pjotr wrote:
Mon Aug 19, 2019 7:07 am
Note that with an AMD Ryzen CPU you might benefit from using the 5.2.x kernel, which is already available in the
(item 8 )
for any new ryzen users seeing this there are two ways to go. the ubuntu way or the mainline way. if you want the proprietary drivers amdgpu-pro which is now called radeon software (have no clue why they do that) it will not work with kernel 5.2 only with the default ubuntu 4.15 which i'm currently on. for the longest time i was convinced by myself and others that with ryzen and most new hardware the way to go is the higher the kernel the better
not in this case i just removed kernel 5.2 because it was spitting out errors while updating sudo update-initramfs -u -k all kernel modules

of course it needs to be the updated kernel 4.15.0.58 and if you deleted it you can install it with apt install linux-generic. but now i'm at a dilemma that now i have amdgpu-pro 19.20 with vulkan-skd and i'm starting to see small idiosyncrasies for instance when i turn on redshift the cursor dims with the redness of the day but the cursor blinks in and out as if it is trying to fade but can't handle the fading. also google chrome videos do the same while redshift is on.

lot to think about if you use ryzen and which way to go. right now the latest amdgpu-pro does not install with dkms so i used a script helper to install it. 19.10 version of amdgpu-pro worked with amdgpu-dkms. AMD does not make it easy on the user and still not convinced which drivers i should stay with
Image

User avatar
catweazel
Level 19
Level 19
Posts: 9229
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by catweazel » Mon Aug 19, 2019 4:56 pm

trytip wrote:
Mon Aug 19, 2019 10:06 am
lot to think about if you use ryzen and which way to go. right now the latest amdgpu-pro does not install with dkms
I think your post may end up confusing someone. You seem to have conflated AMDGPU-PRO issues with Ryzen issues. My Ryzen rigs run fine and have no issues however I don't use AMD GPUs so I can't comment on that.

Cheers.
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

JeremyB
Level 20
Level 20
Posts: 10890
Joined: Fri Feb 21, 2014 8:17 am

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by JeremyB » Mon Aug 19, 2019 5:06 pm

catweazel wrote:
Mon Aug 19, 2019 7:46 am
JeremyB wrote:
Mon Aug 19, 2019 7:37 am
Any chance the problem machine has Ethernet using the r8169 kernel module?
Indeed...

Code: Select all

e1000e                245760  0
r8169                  81920  0
Can you try a 4.18 kernel to see if that works? I know they are EOL but they keep changing their minds on how to handle ASPM in r8169 upstream and Ubuntu might have cherry picked some bad commits

User avatar
catweazel
Level 19
Level 19
Posts: 9229
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by catweazel » Mon Aug 19, 2019 5:08 pm

JeremyB wrote:
Mon Aug 19, 2019 5:06 pm
catweazel wrote:
Mon Aug 19, 2019 7:46 am
JeremyB wrote:
Mon Aug 19, 2019 7:37 am
Any chance the problem machine has Ethernet using the r8169 kernel module?
Indeed...

Code: Select all

e1000e                245760  0
r8169                  81920  0
Can you try a 4.18 kernel to see if that works? I know they are EOL but they keep changing their minds on how to handle ASPM in r8169 upstream and Ubuntu might have cherry picked some bad commits
Yes, I was running 4.18 all the time. It ran fine. It was a boot to 4.18 to the rescue every time I tried an updated 5 series. I would really appreciate a link to any discussion, if you have time. I find these issues fascinating.

Cheers.
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

JeremyB
Level 20
Level 20
Posts: 10890
Joined: Fri Feb 21, 2014 8:17 am

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by JeremyB » Mon Aug 19, 2019 5:53 pm

The discussion is a bit broken, part is on a bug report, https://bugs.launchpad.net/ubuntu/+sour ... ug/1838644
The rest happened on Freenode IRC #ubuntu-kernel, the final bit was
<tjaalton> Timo Aaltonen lotuspsychje: no need to test anything, since I know what's going on
lotuspsychje: 5.2 has this: b75bb8a5b755d0c r8169: disable ASPM again
that's why it doesn't flicker
also, it explains why my builds with smaller config didn't flicker
It is strange as they originally thought the problem was related to the Intel video and the bug reporter was using Intel wifi not the Ethernet. I had the same Intel UHD620 graphics and I had no issues with the 5.0 kernels even with having the 8168 Realtek Ethernet. That likely just means that the UEFI/BIOS firmware is part of the issue

User avatar
catweazel
Level 19
Level 19
Posts: 9229
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by catweazel » Mon Aug 19, 2019 5:57 pm

JeremyB wrote:
Mon Aug 19, 2019 5:53 pm
The discussion is a bit broken, part is on a bug report, https://bugs.launchpad.net/ubuntu/+sour ... ug/1838644
The rest happened on Freenode IRC #ubuntu-kernel, the final bit was
<tjaalton> Timo Aaltonen lotuspsychje: no need to test anything, since I know what's going on
lotuspsychje: 5.2 has this: b75bb8a5b755d0c r8169: disable ASPM again
that's why it doesn't flicker
also, it explains why my builds with smaller config didn't flicker
It is strange as they originally thought the problem was related to the Intel video and the bug reporter was using Intel wifi not the Ethernet. I had the same Intel UHD620 graphics and I had no issues with the 5.0 kernels even with having the 8168 Realtek Ethernet. That likely just means that the UEFI/BIOS firmware is part of the issue
Yes, I think the firmware is part of the problem. I did s little more testing and it seems to be the hardware prefetch feature that's causing the issue for me, but the testing wasn't comprehensive. I have the onboard Intel video switched off.

Thanks for the link. I'll go and read it.
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

JeremyB
Level 20
Level 20
Posts: 10890
Joined: Fri Feb 21, 2014 8:17 am

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by JeremyB » Mon Aug 19, 2019 6:07 pm

Long and boring, lotus and tjaalton were testing different things for days on IRC

User avatar
MurphCID
Level 5
Level 5
Posts: 917
Joined: Fri Sep 25, 2015 10:29 pm

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by MurphCID » Mon Aug 19, 2019 7:36 pm

catweazel wrote:
Mon Aug 19, 2019 7:43 am
MurphCID wrote:
Mon Aug 19, 2019 6:39 am
Please pardon my ignorance, but what are "C" states and why are they important to using Linux on an AMD system? I was running (albeit a short time) my AMD 1700 on an X370 mobo for a short time with no issues. So now I am worried that if I build a Ryzen system it will bork itself if I run Linux.
The Ryzen was and is working flawlessly. It was the Intel that was giving trouble.

Cheers.
Ah., thank you

User avatar
catweazel
Level 19
Level 19
Posts: 9229
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by catweazel » Mon Aug 19, 2019 7:49 pm

MurphCID wrote:
Mon Aug 19, 2019 7:36 pm
catweazel wrote:
Mon Aug 19, 2019 7:43 am
MurphCID wrote:
Mon Aug 19, 2019 6:39 am
Please pardon my ignorance, but what are "C" states and why are they important to using Linux on an AMD system? I was running (albeit a short time) my AMD 1700 on an X370 mobo for a short time with no issues. So now I am worried that if I build a Ryzen system it will bork itself if I run Linux.
The Ryzen was and is working flawlessly. It was the Intel that was giving trouble.

Cheers.
Ah., thank you
My pleasure. C-states are CPU power states., The CPU will successively invoke C-states to reduce power consumption. They are safe to turn off if you want your CPU running at full tilt, full time.
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

User avatar
trytip
Level 11
Level 11
Posts: 3588
Joined: Tue Jul 05, 2016 1:20 pm

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by trytip » Tue Aug 20, 2019 12:14 am

@catweazel
have you upgraded to the latest firmware and booted with kernel 4.15.0-58? you'd be mighty surprised if it worked.
link to linux-firmware-20190815.tar.gz https://git.kernel.org/pub/scm/linux/ke ... mware.git/
extract the folder open a terminal in it and sudo make install it will copy to /lib/firmware/ then sudo update-initramfs -u -k all to get the modules built for the kernel and reboot. they've done a lot of patches with this kernel 4.15. doesn't hurt to try if you haven't

on the ryzen issue you are correct i am only talking about the integrated graphics, but if you use these graphics they also affect the cpu. linux sure hates nvidia but they always have a driver waiting for them after install. not so with amdgpu proprietary. i get nothing in the driver manager, wish i did
Image

User avatar
MurphCID
Level 5
Level 5
Posts: 917
Joined: Fri Sep 25, 2015 10:29 pm

Re: Finally [SOLVED] no reboot or shutdown (a 6 month tale of woe with kernel 5)

Post by MurphCID » Tue Aug 20, 2019 6:51 am

catweazel wrote:
Mon Aug 19, 2019 7:49 pm
MurphCID wrote:
Mon Aug 19, 2019 7:36 pm
catweazel wrote:
Mon Aug 19, 2019 7:43 am


The Ryzen was and is working flawlessly. It was the Intel that was giving trouble.

Cheers.
Ah., thank you
My pleasure. C-states are CPU power states., The CPU will successively invoke C-states to reduce power consumption. They are safe to turn off if you want your CPU running at full tilt, full time.
Let me ask if I understand this correctly; if I turn off the "C" states, my Ryzen 1700 would run at its full 3.0 mhz all the time rather than throttling down when it is not really doing anything? This in turn would generate more heat, and therefore shorten the life of my processor or would it?

Post Reply

Return to “Chat about Linux”