Starting the VM hangs the computer

Questions about virtualization software
Forum rules
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Post Reply
Suwakoto
Level 2
Level 2
Posts: 55
Joined: Wed Jun 09, 2021 2:07 pm

Starting the VM hangs the computer

Post by Suwakoto »

Howdy. I've got this VM which I've made a few months ago and have been using on and off, according to the logs the last time I used it was on Christmas. It is a Windows 10 VM using QEMU/Libvirt and single GPU passthrough. I configured it largely according to this site: https://wiki.archlinux.org/title/PCI_pa ... h_via_OVMF and this video: https://youtu.be/BUSrdUoedTo

Now, I was trying to turn it on earlier today and found myself stuck at a black screen. The machine uses the following script when starting (located in /etc/libvirt/hooks/qemu.d/[VM name]/prepare/begin):

Code: Select all

#!/bin/bash

set -x
systemctl stop display-manager.service
echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
sleep 3
virsh nodedev-detach pci_0000_2b_00_0
virsh nodedev-detach pci_0000_2b_00_1
modprobe vfio-pci  
If I jut try to turn the machine on, I get the black screen and have to restart, and there is nothing new written to the libvirt log file. When the script is removed, instead of freezing the whole computer, the machine just crashes and text is added to the log. If I further remove the GPU from the VM and re-add a Spice server then I can log into the VM just fine. So it seems as though something in this script is causing issues.

I tried removing pieces of it one by one and looking for changes, but nothing ever made a difference. I spent close to an hour trying out different combinations, trying if any of these can freeze the system on their own when executed from the terminal, nothing. I also verified that the IDs for the GPU (0000_2b_00_0 and _1) are indeed correct, that there are no problems with IOMMU groups, and that I only have these two virtual consoles.

I cannot tell whether the machine actually turns on but I just can't see anything due to some bug, or if the machine itself gets stuck while booting. However, if I am in this frozen state and I press the power button, after a few seconds the Mint logo will appear as if it was taking a long time to shut down. But even after 10 minutes it will not shut down, I have to press the reset button.

The weirdest thing about this is that I have not made any changes to the VM's configuration or to my hardware since I last had the machine working just fine. The only thing that might've changed is some Mint software. I tried switching to a different kernel (5.15 instead of 6.5) but there was no difference.
Suwakoto
Level 2
Level 2
Posts: 55
Joined: Wed Jun 09, 2021 2:07 pm

Re: Starting the VM hangs the computer

Post by Suwakoto »

I managed to solve this after a lot of troubleshooting. By running the start script through SSH, I was able to see that it got stuck during detaching the GPU from the host. Looking it up online, I added several lines to the script that disable various packages (or whatever it is modprobe -r does, exactly) related to Nvidia, which let it progress further and detach the GPU successfully. The VM is back to working.
powerhouse
Level 6
Level 6
Posts: 1144
Joined: Thu May 03, 2012 3:54 am
Location: Israel
Contact:

Re: Starting the VM hangs the computer

Post by powerhouse »

Suwakoto wrote: Wed Jan 10, 2024 5:40 pm I managed to solve this after a lot of troubleshooting. By running the start script through SSH, I was able to see that it got stuck during detaching the GPU from the host. Looking it up online, I added several lines to the script that disable various packages (or whatever it is modprobe -r does, exactly) related to Nvidia, which let it progress further and detach the GPU successfully. The VM is back to working.
Can you share the changes you made? May be helpful for others.
Subjects of interest: Linux, vfio passthrough virtualization, photography
See my blog on virtualization, including tutorials: https://www.heiko-sieger.info/category/ ... alization/
Suwakoto
Level 2
Level 2
Posts: 55
Joined: Wed Jun 09, 2021 2:07 pm

Re: Starting the VM hangs the computer

Post by Suwakoto »

powerhouse wrote: Mon Jan 15, 2024 8:23 am Can you share the changes you made? May be helpful for others.
Sure thing. I'll also explain the troubleshooting process a bit more.

I began the process by logging into the computer remotely, from my laptop, using SSH. This meant that I could see any errors and status messages in the console regardless of what the graphics were doing on the computer itself. I began by starting the virtual machine, and then using top through SSH I found that the machine was not actually working, meaning it never actually turned on, meaning something's probably wrong with the pre-launch stuff, like the script above. From there, I ran the script on its own, and Bash actually showed me, one by one, whichever part of the script it was working on. I thus discovered that it hung on virsh nodedev-detatch pci_0000_2b_00_0, which is the command that detaches the GPU from the host system. From this I was able to determine that something is holding the GPU busy, not allowing it to be detached. Using some example scripts from the web, I ended up adding the following lines to the script, just below sleep 3:

Code: Select all

modprobe -r nvidia_drm
modprobe -r nvidia_modeset
modprobe -r drm_kms_helper
modprobe -r nvidia
modprobe -r i2c_nvidia_gpu
modprobe -r drm
I'm not sure if I have all these kernel modules, frankly, but regardless of that, it properly "freed" the GPU, allowing it to be detached and the script to finish executing. With this green light, I returned to the main computer, attempted to turn on the virtual machine, and at last I saw output.
powerhouse
Level 6
Level 6
Posts: 1144
Joined: Thu May 03, 2012 3:54 am
Location: Israel
Contact:

Re: Starting the VM hangs the computer

Post by powerhouse »

Nice description - I'm sure this will help out others. The ssh part is crucial and many folks don't see point in it. Everyone trying to do vfio - whether dual GPU or particular with a single GPU - should first install and configure an ssh server on the host and make sure they have access from another PC.

Thanks for sharing your solution! Looks like the Nvidia drivers have to be removed before attempting to detach the GPU.
Subjects of interest: Linux, vfio passthrough virtualization, photography
See my blog on virtualization, including tutorials: https://www.heiko-sieger.info/category/ ... alization/
trinidad
Level 1
Level 1
Posts: 40
Joined: Fri Dec 23, 2022 11:14 am

Re: Starting the VM hangs the computer

Post by trinidad »

The ssh part is crucial
Secure Linux transparent networking is a wonderful thing. All of my VMs, hardware installs, Windows and Linux systems can connect. With Windows all my connectivity workarounds are solved with Linux. Linux networking is what lured me to Linux many years ago at the beginning of the PC age. SSH is one of the most useful networking tools ever.

https://dbts-analytics.com/indextut.html

TC
Post Reply

Return to “Virtual Machines”