You are viewing documentation for a previous version of Platform9 products. For the latest Private Cloud Director documentation, click here.
Managed Kubernetes
5.14
×
Overview
Getting Started
Clusters
PMK CLI
APIs
In Cluster Monitoring
Catapult Remote Monitoring
Platform9 Managed Add-ons
Platform Administration
Workloads & Apps
Support
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Troubleshooting Cluster Issues
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Cluster Creation
Public Cloud Provider
- Make sure the permissions for the account you provided to PMK as part of cloud provider creation has all the required privileges. See the AWS pre-requisites under Getting Started section for more details
Cluster Creation Fails for BareOS
- Navigate to Infrastructure -> Clusters tab.
- Click on the cluster name. This will take you to the cluster details page.
- Click on the “Node Health” tab
Here you should see detailed breakdown of which nodes failed to install and which specific steps failed. Next, check Troubleshooting Node Issues.
Etcd
Heartbeat/Election Timeout Interval
Bash
2021-02-04 18:36:31.380207 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 124.999498ms, to 92d6e239c543436)2021-02-04 18:36:31.380220 W | etcdserver: server is likely overloaded2021-02-04 18:36:31.382208 W | etcdserver: read-only range request "key:\"/registry/mutatingwebhookconfigurations/vault-agent-injector-cfg\" " with result "range_response_count:1 size:2723" took too long (264.355727ms) to executeETCD_HEARTBEAT_INTERVAL - This is the frequency with which the leader will notify followers that it is still the leader.
ETCD_ELECTION_TIMEOUT - This timeout is how long a follower node will go without hearing a heartbeat before attempting to become a leader itself.
By default, etcd uses a100msheartbeat interval and1000mselection timeout.
Bash
# cat /etc/pf9/kube.env | grep -i etcdexport ETCD_HEARTBEAT_INTERVAL="1000"export ETCD_ELECTION_TIMEOUT="10000"Database Size Exceeded

Bash
etcdserver: failed to apply request,took 2.429<C2><B5>s,request header:<ID:1920634987875929770 > txn:<compare:<target:MOD key:"/registry/services/endpoints/kube-system/kube-controller-manager" mod_revision:287319046 > success:<request_put:<key:"/registry/services/endpoints/kube-system/kube-controller-manager" value_size:473 >> failure:<>>,resp ,err is etcdserver: no space- Stop the
pf9-hostagentandnodeletdservices on the master node(s).
Bash
sudo systemctl stop pf9-{hostagent,nodeletd}- Issue a
stopfor the Nodelet phases.
Bash
/opt/pf9/nodelet/nodeletd phases stop- In
/opt/pf9/pf9-kube/master_utils.sh, modify the functionensure_etcd__r_unning()to add the following environment variable.
/opt/pf9/pf9-kube/master_utils.sh
--volume ${ETCD_DATA_DIR}:/var/etcd/data \ -e ETCD_DEBUG=${DEBUG} -e ETCD_QUOTA_BACKEND_BYTES=<size_in_bytes>"- Start the
pf9-hostagentservice.
Bash
sudo systemctl start pf9-hostagent- Verify the size was correctly set by scraping the etcd metrics endpoint.
Bash
curl -L http://localhost:2379/metrics | grep etcd_server_quota_backend_bytesType to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on Jun 6, 2024
Was this page helpful?
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message