Problem
In some environments, compute nodes may experience elevated CPU utilization caused by orphaned virtual machine (QEMU) processes. This occurs when certain instances are deleted or migrated at the control-plane level (Compute DB), but their corresponding QEMU processes continue to run on the hypervisor.
Symptoms include:
- QEMU processes consuming CPU although the VM no longer exists in the database.
- Mismatch between the number of instances reported in Nova DB versus running on the hypervisor.
- Periodic warnings in
nova.compute.managerlogs during instance power-state synchronization.
Environment
- Private Cloud Director Virtualization - v2025.6 and Higher
- Private Cloud Director Kubernetes – v2025.6 and Higher
- Self-Hosted Private Cloud Director Virtualization - v2025.6 and Higher
- Self-Hosted Private Cloud Director Kubernetes - v2025.6 and Higher
- Compute Service
Cause
Compute Service periodically performs a synchronization cycle where it validates running instances on the hypervisor against entries in the Compute database.
In this case:
- Several VMs were deleted or migrated, but their QEMU processes persisted on the source hypervisor.
- These orphaned processes continued running because the default Nova behavior did not automatically clean them up.
- As a result, CPU utilization on the affected hypervisor increased unnecessarily.
Additionally, frequent instance deletions/migrations caused temporary discrepancies in instance counts during sync cycles, contributing to repeated warnings in the logs.
Diagnostics
1. Power State Synchronization Warnings
Compute service logs indicate mismatches between DB-reported VM count and hypervisor-reported VM count:
$ less /var/log/pf9/ostackhost.logWARNING nova.compute.manager [...] While synchronizing instance power states, found 83 instances in the database and 86 instances on the hypervisor.These messages demonstrate:
- The presence of additional QEMU processes not tracked in DB.
- Consistent mismatch over multiple cycles.
Resolution
- To ensure Compute Service automatically cleans up orphaned QEMU processes, the following configuration was added to /opt/pf9/etc/nova/conf.d/nova_override.conf on all affected compute hosts:
$ vi /opt/pf9/etc/nova/conf.d/nova_override.conf [DEFAULT]running_deleted_instance_action = reap- Restart the pf9-ostackhost service post adding the above change on the hypervisors. The restart of this service will also cleanup any VMs which are stuck in the deleting phase as per the Compute service database.
$ sudo systemctl restart pf9-ostackhostEffect of this configuration: When Nova detects an instance running on the hypervisor that does not exist in the Compute Service database:
- Frees compute node CPU/memory resources.
- Ensures hypervisor state aligns with Nova DB state.
- Orphaned VMs are now automatically removed within the standard 30-minute sync cycle
This configuration is included as part of standard deployment from Platform9 cloud Director - v2025.10 and higher