You are viewing documentation for a previous version of Platform9 products. For the latest Private Cloud Director documentation, click here.
Managed Kubernetes
5.14
×
Overview
Getting Started
Clusters
PMK CLI
APIs
In Cluster Monitoring
Platform9 Managed Add-ons
Platform Administration
Workloads & Apps
Support
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
Catapult Rules & Alarms
Summarize Page
Copy Markdown
Open in ChatGPT
Open in Claude
Catapult has 56 built in rules, each rule falls into one of the categories below:
- Calico Monitoring
- etcd Monitoring
- Kubernetes API Monitoring
- Kube State Monitoring
- Node OS Monitoring
- Environment Status
- Node Connectivity Monitoring
- Addons Monitoring
Below is a summary of each rule, where the data is collected, the component that is monitored and the action require if the alert is received.
etcd
Below is a list of the rule names, where the data is collected from, the monitored components, and the suggested action to remedy the issue.
| Rule Name | Data Collection Point | Monitored Component | Action Required |
|---|---|---|---|
| EtcdBackupJobFailed | Cluster | etcd | Review etcd backup task. Possibly lack of disk space. |
| EtcdDown | Cluster | etcd | Verify if nodelet can repair etcd. Check for CPU, memory, or storage issues on the node. Review logs of etcd container. |
| EtcdInsufficientMembers | Cluster | etcd | Etcd quorum lacks an odd number of masters. |
| EtcdNoLeader | Cluster | etcd | Etcd quorum has no leader. Investigate network issues causing etcd members not reaching consensus. |
| EtcdHighNumberOfLeaderChanges | Cluster | etcd | If leader of etcd quorum is changing frequently, verify master nodes are not constantly failing. Possible network issues between master nodes. |
| EtcdHighNumberOfFailedGrpcRequests | Cluster | etcd | Investigate network issues causing failed grpc requests in the etcd quorum. Review etcd container logs. |
| EtcdHighNumberOfFailedProposals | Cluster | etcd | Inspect network issues causing failed proposals in the etcd quorum. Check etcd container logs. |
| EtcdHighFsyncDurations | Cluster | etcd | Review disk or storage issues which may cause problems when an etcd member tries to commit data to disk. |
| EtcdHighCommitDurations | Cluster | etcd | Review disk or storage issues which may cause problems when an etcd member tries to commit data to disk. |
Kubernetes API Monitoring
| Rule Name | Data Collection Point | Monitored Component | Action Required |
|---|---|---|---|
| KubeAPIServerDown | Cluster | Kube API server | Verify API Server is responding on the node |
| KubernetesApiServerErrors | Cluster | Kube API server | Check for API Server overload. Review logs. |
| KubernetesApiClientErrors | Cluster | Kube API server | Check for API Server overload. Survey logs to audit client requests. |
| KubeSchedulerDown | Cluster | Kube Scheduler | Audit logs for k8s master pod restarts. |
| KubeControllerManagerDown | Cluster | kube controller | Audit logs for k8s master pod restarts. |
| KubeProxyDown | Cluster | kube proxy | Verify kube proxy container is running on the node. |
| KubeProxyRuleSyncLatency | Cluster | kube proxy | Explore kube proxy overload via logs. |
Kube State
| Rule Name | Data Collection Point | Monitored Component | Action Required |
|---|---|---|---|
| KubeNodeNotReady | Cluster | kube state metrics | Review issues with kubelet on the node. |
| KubernetesMemoryPressure | Cluster | kube state metrics | Check available memory on the node. |
| KubernetesDiskPressure | Cluster | kube state metrics | Check available space on the node. |
| KubernetesJobFailed | Cluster | kube state metrics | Check logs for failed jobs. |
| KubernetesContainerTerminated | Cluster | kube state metrics | K8s container was terminated, possible OOM killer. |
| KubePodCrashLooping | Cluster | kube state metrics | Review logs on the crashed pod. |
| KubePodNotReady | Cluster | kube state metrics | Pod may be in pending state, or awaiting a resource. |
| KubeDeploymentReplicasMismatch | Cluster | kube state metrics | K8s deployment has not reconciled the expected number of replicas. Possibly related to CPU, memory requests, or node taints. |
| KubeStatefulSetReplicasMismatch | Cluster | kube state metrics | K8s deployment has not reconciled the expected number of replicas. Possibly related to CPU, memory requests, or node taints. |
| KubeDaemonSetRolloutStuck | Cluster | kube state metrics | K8s deployment has not reconciled the expected number of replicas. Possibly related to CPU, memory requests, or node taints. |
| KubernetesPersistentvolumeclaimPending | Cluster | kube state metrics | PVC is in a pending state as K8s was unable to create a PV. Verify details of storage class, CSI driver etc. |
| KubernetesPersistentvolumeError | Cluster | kube state metrics | PV is in an error state. Check storage class or CSI drivers, and logs. |
Node OS
| Rule Name | Data Collection Point | Monitored Component | Action Required |
|---|---|---|---|
| HostHighCpuLoad | Node | Node exporter | High CPU usage on node. Check processes or responsible container. |
| HostOutOfMemory | Node | Node exporter | High memory usage on node. Check processes or responsible container. |
| HostMemoryUnderMemoryPressure | Node | Node exporter | High memory usage on node. Check processes or responsible container. |
| NodeFilesystemAlmostOutOfSpace | Node | Node exporter | Node out of disk space. |
| NodeFilesystemAlmostOutOfFiles | Node | Node exporter | Node out of inodes. |
| HostUnusualNetworkThroughputIn | Node | Node exporter | Host is experiencing unusual inbound network throughput. |
| HostUnusualNetworkThroughputOut | Node | Node exporter | Host is experiencing unusual outbound network throughput |
| NodeNetworkReceiveErrs | Node | Node exporter | Host is experiencing network receive errors. |
| NodeNetworkTransmitErrs | Node | Node exporter | Host is experiencing network transmit errors. |
| HostUnusualDiskWriteRate | Node | Node exporter | Host is experiencing unusual disk write rate. |
| HostUnusualDiskReadRate | Node | Node exporter | Host is experiencing unusual disk write rate. |
Environment Status
| Rule Name | Data Collection Point | Monitored Component | Action Required |
|---|---|---|---|
| ClusterStatusNotOK | PMK | SaaS Mgmt Plane | Cluster status is not ok in PMK Mgmt Plane DB. |
| NodeNotReady | PMK | SaaS Mgmt Plane | K8s node has entered a not ready state as noted by a HostAgent extension. |
| K8sApiNotResponding | PMK | SaaS Mgmt Plane | PMK Mgmt Plane cannot reach k8s apiserver. |
| WorkerNodeNotResponding | PMK | SaaS Mgmt Plane | PMK Mgmt Plane cannot reach worker node. |
Node Connectivity
| Rule Name | Data Collection Point | Monitored Component | Action Required |
|---|---|---|---|
| Host Availability | PMK | SaaS Mgmt Plane | Node Heartbeat is failing. This could be caused by a node outage or a failed service. Review the node to ensure that all Platform9 services are running. |
| Hosts disconnected | PMK | SaaS Mgmt Plane | Partial node availability, heartbeat is passing, review the node to ensure that all Platform9 services are running. |
| host-down | PMK | SaaS Mgmt Plane | The node is completely disconnected from Platform9. Ensure the node is running and all services are operating. |
Managed Add-ons
| Rule Name | Data Collection Point | Monitored Component | Action Required |
|---|---|---|---|
| AddonNotHealthy | PMK | Nodelet | ClusterAddon’s .status.health shows addon is not healthy. |
| AddonNotConverging | PMK | Nodelet | ClusterAddon’s .status.phase is not in Installed state. |
| AddonInstallError | PMK | Nodelet | ClusterAddon’s .status.phase is in an InstallError state. |
| AddonUninstallError | PMK | Nodelet | ClusterAddon’s .status.phase is in an UninstallError state. |
Calico
| Rule Name | Data Collection Point | Monitored Component | Action Required |
|---|---|---|---|
| PromHTTPRequestErrors | Node | calico-felix | Prometheus unable to pull data from calico. Underlying network or calico-node pod issues. |
| CalicoDatapaneFailuresHigh | Node | calico-felix | Calico-node pod is congested. Reduce load or restart restart calico-node pod. |
| CalicoIpsetErrorsHigh | Node | calico-felix | Calico-node pod is congested. Reduce load or restart restart calico-node pod. |
| CalicoIptableSaveErrorsHigh | Node | calico-felix | Calico-node pod is congested. Reduce load or restart restart calico-node pod. |
| CalicoIptableRestoreErrorsHigh | Node | calico-felix | Calico-node pod is congested. Reduce load or restart restart calico-node pod. |
| TyphaPingLatency | Node | calico-typha | Check network connectivity to the calico-typha pods. |
| TyphaClientWriteLatency | Node | calico-typha | Verify connectivity between calico-typha and kubernetes API server/etcd |
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on Jan 23, 2026
Was this page helpful?
Next to read:
Calico MonitoringDiscard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message