porter-dev · sophiajwitt · Jan 30, 2026 · Jan 30, 2026 · Jan 30, 2026
diff --git a/cloud-accounts/changing-instance-types.mdx b/cloud-accounts/changing-instance-types.mdx
diff --git a/cloud-accounts/cluster-observability.mdx b/cloud-accounts/cluster-observability.mdx
@@ -0,0 +1,82 @@
+---
+title: "Cluster Observability"
+sidebarTitle: "Cluster Observability"
+description: "Monitor cluster health, resource usage, and infrastructure metrics"
+---
+
+Porter provides built-in observability for your cluster infrastructure through the **Infrastructure** dashboard. Access it by clicking **Infrastructure** in the left sidebar.
+
+---
+
+## Pods
+
+The **Pods** tab provides a real-time view of all pods running in your cluster.
+
+- **Search**: Filter pods by name
+- **Filters**: Filter by status or namespace
+
+Each pod displays:
+
+| Column | Description |
+|--------|-------------|
+| **Pod name** | The name of the pod |
+| **Namespace** | Kubernetes namespace (e.g., `kube-system`, `default`) |
+| **Status** | Current state (Running, Pending, Failed, etc.) |
+| **Ready** | Container readiness (e.g., `1/1`) |
+| **Restarts** | Number of container restarts |
+| **CPU** | CPU usage |
+| **Memory** | Memory usage |
+| **Memory %** | Percentage of memory limit used |
+| **Age** | Time since pod creation |
+
+---
+
+## Nodes
+
+The **Nodes** tab shows your cluster's node groups and individual nodes.
+
+### Node Groups View
+
+The default view displays all node groups:
+
+| Column | Description |
+|--------|-------------|
+| **Node group** | Name of the node group (e.g., default, monitoring, system) |
+| **Instance type** | The machine type for nodes in this group |
+| **Utilization** | Visual indicator of resource usage |
+| **Actions** | Link to view detailed metrics |
+
+### Individual Nodes View
+
+Click on a node group to see individual nodes:
+
+- **Node name**: The cloud provider's node identifier
+- **Node group**: Which node group this node belongs to
+- **Instance type**: The machine type
+- **CPU**: CPU utilization shown as utilized (yellow) vs reserved (blue)
+- **Memory**: Memory utilization shown as utilized (yellow) vs reserved (blue)
+- **Status**: Node health status (Ready, NotReady)
+
+Click **Metrics >** on any node group to view historical instance counts over time.
+
+---
+
+## Integrating External Monitoring
+
+For application-level monitoring and alerting, integrate with external observability platforms:
+
+<CardGroup cols={3}>
+  <Card title="Datadog" icon="dog" href="/observability/integrations">
+    Full-stack monitoring with APM, logs, and infrastructure metrics
+  </Card>
+  <Card title="New Relic" icon="chart-line" href="/observability/integrations">
+    Application performance monitoring and alerting
+  </Card>
+  <Card title="Grafana" icon="chart-area" href="/observability/integrations">
+    Dashboards and visualization for metrics and logs
+  </Card>
+</CardGroup>
+
+See [Third party observability](/addons/third-party-observability) or reach out to support for more information.
+
+
diff --git a/cloud-accounts/cluster-upgrades.mdx b/cloud-accounts/cluster-upgrades.mdx
@@ -1,10 +1,12 @@
 ---
 title: "Cluster Upgrades"
+sidebarTitle: "Cluster Upgrades"
+description: "How Porter manages Kubernetes upgrades for your cluster"
 ---
 
 Keeping your Kubernetes clusters up-to-date is essential for ensuring security, stability, and access to the latest features built by the wider Kubernetes community as well as the underlying public cloud. To that end, we take care of managed Kubernetes upgrades for all clusters provisioned through our platform. Our automated upgrade process ensures your clusters remain current without disrupting your workloads, so you can focus on building and deploying your applications while we handle the complexities of cluster maintenance.
 
-# Shared Responsibility Model
+## Shared Responsibility Model
 
 We've endeavoured to build a world-class cluster management system which is able to manage and upgrade customer infrastructure without causing disruption to customer workloads. To that end, we've defined a shared responsibility model which maps out the roles played by Porter's engineering/SRE teams as well as customers to ensure the best possible experience with upgrades.
 
@@ -30,11 +32,11 @@ More documentation around zero-downtime deployments may be found [here](/configu
 
 3. Maintaining a constant stream of communication around upgrade timelines and statuses.
 
-# Upgrade Calendar
+## Upgrade Calendar
 
 Kubernetes follows a release cycle where there are - approximately - three minor version releases a year. Every release is followed by a period where public clouds integrate the new version into their managed Kubernetes offerings and run tests to ensure compatibility with the underlying cloud. Our upgrade calendar is thus dependent on both release cycles. To account for that, we carry out cluster upgrades twice a year, where we "leapfrog" over versions to ensure customer clusters are running the _latest stable_ version of Kubernetes. These are typically carried out once towards the end of Q1/beginning of Q2 and then later towards the end of Q3. 
 
-# Upgrade Path
+## Upgrade Path
 
 When a new version of upstream Kubernetes is released, we closely track the corresponding release on public clouds in conjunction with the wider community as well as our public cloud partners (AWS, Google Cloud, Azure). 
 
@@ -50,4 +52,5 @@ When a new version of upstream Kubernetes is released, we closely track the corr
 
 3. After our tests are successful, we announce a timeline for upgrades over our comms channels on Slack. At this point, while we typically announce a window during low-traffic hours when upgrades are conducted, customers have the option of scheduling a specific slot.
 
-4. When a cluster is upgraded, we upgrade system components, all app templates, the managed cluster control plane as well as all nodegroups. While this operation is meant to be non-disruptive, there are certain prerequisites on the customers' end to ensure zero downtime (see the section below for more details). 
+4. When a cluster is upgraded, we upgrade system components, all app templates, the managed cluster control plane as well as all nodegroups. While this operation is meant to be non-disruptive, there are certain prerequisites on the customers' end to ensure zero downtime (see the section below for more details).
+