Knowledge Base
2025.10
GENERIC
Networking
Storage
Compute
Designate
Orchestration
Self-Hosted
Install
UPGRADE
Monitoring
Add-Ons
Powered By

Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Multiple PCD Services Down and Hosts Unresponsive Due to Sealed Decco Vault
Copy Markdown
Open in ChatGPT
Open in Claude
Problem
- Hosts become unresponsive and report as offline.
- Critical Platform9 services on the host, such as
pf9-ostackhostandpf9-cindervolume-base,pf9-neutron-ovn-metadata-agententer,pf9-novncproxyentered a failed state.
Environment
- Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
Cause
Following pods on the control plane were stuck in the Initializing phase:
- Decco Vault in the
defaultnamespace - Vouch Keystone in the corresponding region namespace
- Vouch NoAuth in the corresponding region namespace
The Decco Vault was found to be in a sealed state. A sealed state is a security posture where the vault's data is encrypted and inaccessible.
Diagnostics
Most of the service logs report connectivity errors.
ostackhost logs
ERROR oslo.messaging._drivers.impl_rabbit [REQ_ID] AMQP server on 127.0.0.1:5673 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error>ERROR oslo.messaging._drivers.impl_rabbit [REQ_ID] AMQP server on 127.0.0.1:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 2 seconds.: ConnectionRefusedError: [Errno 111] ECONNREFUSEDcindervolume-base logs
ERROR oslo.messaging._drivers.impl_rabbit [-] [REQ_ID] AMQP server on 127.0.0.1:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 2 seconds.: ConnectionRefusedError: [Errno 111] ECONNREFUSEDEADDRINUSE encountered during SNI client creation ... retrying in 20000 milliseconds/ERRORnn:qlscd ..lsvim novG/ERROR:qvim cinvG/ERRORmodel server went away: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')nResolution
- Cordoned the node where the Decco Vault,Vouch Keystone,Vouch NoAuth Pods are scheduled.
Command
xxxxxxxxxx$ kubectl cordon <node_name>If all three pods are located on the same node, only need to cordon that single node. However, if the pods are distributed across different nodes, cordon each node and restart the pods one by one. The primary goal here is to reschedule the pods from their current node.
- Performed a rollout restart of the following deployments in the same sequence
- Decco Vault in the
defaultnamespace - Vouch Keystone in the corresponding region namespace
- Vouch NoAuth in the corresponding region namespace
Command
xxxxxxxxxx$ kubectl rollout restart deployment <deployment-name> -n <namespace>- Restarted the pf9-hostagent service on the affected host.
Command
xxxxxxxxxx# systemctl restart pf9-hostagentValidation
- The hypervisor returned to a healthy state
- All Platform9 services were observed to be running normally on the affected node.
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
Last updated on
Was this page helpful?
Next to read:
VMHA Stuck in "Waiting"null
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message