Troubleshooting Cluster Issues

Cluster Creation

Public Cloud Provider

  • Make sure the permissions for the account you provided to PMK as part of cloud provider creation has all the required privileges. See the AWS pre-requisites under Getting Started section for more details

Cluster Creation Fails for BareOS

  • Navigate to Infrastructure -> Clusters tab.
  • Click on the cluster name. This will take you to the cluster details page.
  • Click on the “Node Health” tab

Here you should see detailed breakdown of which nodes failed to install and which specific steps failed. Next, check Troubleshooting Node Issues.

Etcd

Heartbeat/Election Timeout Interval

Bash
Copy

ETCD_HEARTBEAT_INTERVAL - This is the frequency with which the leader will notify followers that it is still the leader.

ETCD_ELECTION_TIMEOUT - This timeout is how long a follower node will go without hearing a heartbeat before attempting to become a leader itself.

By default, etcd uses a100msheartbeat interval and1000mselection timeout.

Bash
Copy

Database Size Exceeded

Bash
Copy
  1. Stop the pf9-hostagent and nodeletd services on the master node(s).
Bash
Copy
  1. Issue a stop for the Nodelet phases.
Bash
Copy
  1. In /opt/pf9/pf9-kube/master_utils.sh , modify the function ensure_etcd__r_unning()to add the following environment variable.
/opt/pf9/pf9-kube/master_utils.sh
Copy
  1. Start the pf9-hostagent service.
Bash
Copy
  1. Verify the size was correctly set by scraping the etcd metrics endpoint.
Bash
Copy
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches
  Last updated by Anagha Pamidi