Knowledge Base
2025.10
GENERIC
Networking
Storage
Compute
Designate
Orchestration
Self-Hosted
Install
UPGRADE
Monitoring
Add-Ons
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
The vGPU Mapped Instances Fails to Boot With Error "Node device not found"
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Problem
- After rebooting the vGPU capable Host as part of an upgrade/normal reboot, the vGPU enabled VMs are failing to start.
- This leads to
pf9-ostackhostservice not converging properly and any new VMs will not be getting spawned and older VMs are expected to be stuck in powering up state with below error:
ostackhost.log
INFO nova.virt.libvirt.host [req-ID None None] Secure Boot support detected ERROR oslo_service.service [req-ID None None] Error starting thread.: libvirt.libvirtError: Node device not found: no node device with matching name 'mdev_[UUID-of-mediated-devices]' TRACE oslo_service.service Traceback (most recent call last): TRACE oslo_service.service File "/opt/pf9/venv/lib/python3.9/site-packages/oslo_service/service.py", line 810, in run_service TRACE oslo_service.service service.start()[..] TRACE oslo_service.service raise libvirtError('virNodeDeviceLookupByName() failed') TRACE oslo_service.service libvirt.libvirtError: Node device not found: no node device with matching name 'mdev_[UUID-of-mediated-devices]' TRACE oslo_service.serviceEnvironment
- Private Cloud Director Virtualisation - v2025.6 and Higher
- Self-Hosted Private Cloud Director Virtualisation - v2025.6 and Higher
- Component: GPU [NVIDIA drivers-
v570andv580]
Solution
- This is a known issue, to track the progress of the bug: PCD-2656, reach out to the Platform9 Support Team mentioning the bug ID.
- For remediation follow the provide details in the workaround section.
Root Cause
- The identified cause is that the mdev devices are not getting associated with the consumed vGPU because the interconnecting logic is yet to get worked out .
- As a result the VMs that were trying to go active were failing due to no device found error.
- This is a known issue, to track the progress of the bug: PCD-2656, reach out to the Platform9 Support Team mentioning the bug ID.
Workaround
Check the output of
$ mdev listto see if it is empty to confirm this issue.Also the output of
$ lspci -nnn | grep -i nvidia, does not list SRIOV devices.To resolve this issue, run the GPU Configuration Script in location
/opt/pf9/gpu.While executing the script:- Move to the location having the script:
$ cd /opt/pf9/gpu/. - Run the script
$ sudo ./pf9-gpu-configure.shwith option3) vGPU SR-IOV configure. - Run the script
pf9-gpu-configure.shwith6) Validate vGPUto check if the GPU is configured. - Re-run the
$ lspci -nnn | grep -i nvidiawhich should now list all the VFs for the given GPU. - Run the
pf9-gpu-configure.shwith option4) vGPU host configure - From the UI under the
GPU hostsection, the host should now be visible. - From the UI select the host and the required GPU profile for the host and continue to save the form.
- Monitor the UI to see the host completes the converge action.
- Move to the location having the script:
Post the
Step-3, from the command line list out the UUID that are associated to the failing VMs instance from thepf9-ostackhostlogs.Identify UUID Errors- Check the
ostackhostlogs to identify UUIDs causing attachment errors.
OstackHost log
TRACE oslo_service.service libvirt.libvirtError: Node device not found: no node device with matching name 'mdev_[UUID_OF_MEDIATED_DEVICES]'- Map UUIDs to Bus IDs - Use the
echocommand to map UUIDs to the appropriate bus from vGPU host:
Command
​x
$ echo <UUIDs> > /sys/class/mdev_bus/<BUS_ID>/mdev_supported_types/nvidia-558/create​Example:$ echo [UUID_OF_MEDIATED_DEVICES] > /sys/class/mdev_bus/0000:21:00.5/mdev_supported_types/nvidia-558/create- Restart NVIDIA vGPU Manager- Restart the NVIDIA vGPU manager service and verify its status:
Command
$ systemctl restart nvidia-vgpu-mgr$ systemctl status nvidia-vgpu-mgr- Restart the ostackhost service and monitor the status of those stuck VMs from UI.
Command
$ systemctl restart pf9-ostackhostValidation
List the newly added mdev devices using the below command:
- The
mdevctl listgives non-empty output.
Command
$ mdevctl list​Example:$ mdevctl list[UUID_OF_MEDIATED_DEVICES_1] 0000:21:00.5 nvidia-558d[UUID_OF_MEDIATED_DEVICES_2] 0000:21:00.6 nvidia-558d[UUID_OF_MEDIATED_DEVICES_3] 0000:21:00.7 nvidia-558d- The vGPU enabled VMs will no longer be stuck in
powering-onstuck state.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on
Was this page helpful?
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message