Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
4a58718
Implement better reservation handling logic and capacity calculations
jamOne- Feb 5, 2026
56ce3ed
more vibes
jamOne- Feb 6, 2026
e1b98b9
available_slices
jamOne- Feb 6, 2026
30396b4
vibe
jamOne- Feb 6, 2026
f92c768
propagate return_code
jamOne- Feb 6, 2026
ca08a7d
manual edits
jamOne- Feb 6, 2026
5907d0e
csv
jamOne- Feb 6, 2026
8068331
sub-block fit more slices
jamOne- Feb 9, 2026
2c17408
force_sub_block_targeting
jamOne- Feb 9, 2026
ab92416
better csv
jamOne- Feb 9, 2026
dfae913
even better csv
jamOne- Feb 9, 2026
980eaba
manual
jamOne- Feb 9, 2026
c9690b2
unit tests fix
jamOne- Feb 10, 2026
a909919
Revert splitting of capacity node selector tests in this branch
jamOne- Feb 10, 2026
0b4ea91
remove capacity type node selector test
jamOne- Feb 10, 2026
dda1a86
Resolve merge conflicts from main
jamOne- Feb 10, 2026
b195664
capacity_test.py final
jamOne- Feb 10, 2026
5841365
nodepool adjustments
jamOne- Feb 10, 2026
0ad0737
add deduplication
jamOne- Feb 10, 2026
a756bbb
inUseCount -> in_use_count
jamOne- Feb 10, 2026
2de1831
_get_reservation_count aggregateReservation fix
jamOne- Feb 10, 2026
dd90dd2
json parsing
jamOne- Feb 11, 2026
0cf9157
refactor: introduce _list_healthy_sub_blocks
jamOne- Feb 11, 2026
e7e6748
add reservation_accelerator_type
jamOne- Feb 11, 2026
2e1d42e
specific reservation machine_type
jamOne- Feb 12, 2026
b2c55bf
_get_reservation_cached
jamOne- Feb 12, 2026
0009c09
_get_reservation_cached everywhere
jamOne- Feb 13, 2026
7b4e74d
Merge branch 'main' into reservation-handling
jamOne- Feb 13, 2026
04d3eb6
remove update_system_characteristics.py
jamOne- Feb 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions recipes/Cluster_create_RayCluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ $ xpk cluster create-ray --project=golden-project --zone=us-central1-a --cluster
[XPK] Starting xpk v0.0.0
[XPK] Starting cluster create for cluster golden-cluster:
[XPK] Working on golden-project and us-central1-a
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
[XPK] Task: `Get reservation golden-reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a --format="json(specificReservation,aggregateReservation,status,deploymentType,resourcePolicies)"
[XPK] Task: `Determine server supported GKE versions for default gke version` is implemented by the following command not running since it is a dry run.
gcloud container get-server-config --project=golden-project --region=us-central1 --flatten="channels" --filter="channels.channel=RAPID" --format="value(channels.defaultVersion)"
[XPK] Task: `Determine server supported GKE versions for valid versions` is implemented by the following command not running since it is a dry run.
Expand Down Expand Up @@ -50,8 +50,6 @@ gcloud beta container clusters describe golden-cluster --location us-central1 --
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, supports_super_slicing=False, supports_accelerator_network_profile=False, docker_platform=<DockerPlatform.AMD: 'linux/amd64'>, requires_workload_policy=False, gpu_config=None, parallel_containers=2)
[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run.
gcloud beta container node-pools list --cluster golden-cluster --project=golden-project --location=us-central1 --format="csv[no-heading](name)"
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
[XPK] Creating 1 node pool or pools of tpu7x-8
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, supports_super_slicing=False, supports_accelerator_network_profile=False, docker_platform=<DockerPlatform.AMD: 'linux/amd64'>, requires_workload_policy=False, gpu_config=None, parallel_containers=2)
[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run.
Expand All @@ -64,8 +62,6 @@ kubectl get configmap golden-cluster-resources-configmap -o=custom-columns="Conf
[XPK] Pretending all the jobs succeeded
[XPK] Create or delete node pool request complete.
[XPK] Creating ConfigMap for cluster
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
[XPK] Temp file (0604d72ef175c94fc796d8f02cff009b4241e85d444d22d414a56a47764d7bbb) content:
kind: ConfigMap
apiVersion: v1
Expand Down
8 changes: 2 additions & 6 deletions recipes/Cluster_create_private.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ $ xpk cluster create-pathways --project=golden-project --zone=us-central1-a --cl
[XPK] Working on golden-project and us-central1-a
[XPK] Task: `Retrieve available pathways machine types` is implemented by the following command not running since it is a dry run.
gcloud compute machine-types list --filter "guestCpus >= 49 AND memoryMb >= 238592 AND zone = 'us-central1-a'" --format="value(name)" --project=golden-project
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
[XPK] Task: `Get reservation golden-reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a --format="json(specificReservation,aggregateReservation,status,deploymentType,resourcePolicies)"
[XPK] Task: `Determine server supported GKE versions for default gke version` is implemented by the following command not running since it is a dry run.
gcloud container get-server-config --project=golden-project --region=us-central1 --flatten="channels" --filter="channels.channel=RAPID" --format="value(channels.defaultVersion)"
[XPK] Task: `Determine server supported GKE versions for valid versions` is implemented by the following command not running since it is a dry run.
Expand Down Expand Up @@ -54,8 +54,6 @@ gcloud beta container clusters describe golden-cluster-private --location us-cen
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu-v5p-slice', gce_machine_type='ct5p-hightpu-4t', chips_per_vm=4, accelerator_type=TPU, device_type='v5p-8', supports_sub_slicing=False, supports_super_slicing=False, supports_accelerator_network_profile=False, docker_platform=<DockerPlatform.AMD: 'linux/amd64'>, requires_workload_policy=False, gpu_config=None, parallel_containers=1)
[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run.
gcloud beta container node-pools list --cluster golden-cluster-private --project=golden-project --location=us-central1 --format="csv[no-heading](name)"
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
[XPK] Creating 1 node pool or pools of v5p-8
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu-v5p-slice', gce_machine_type='ct5p-hightpu-4t', chips_per_vm=4, accelerator_type=TPU, device_type='v5p-8', supports_sub_slicing=False, supports_super_slicing=False, supports_accelerator_network_profile=False, docker_platform=<DockerPlatform.AMD: 'linux/amd64'>, requires_workload_policy=False, gpu_config=None, parallel_containers=1)
[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run.
Expand All @@ -69,8 +67,6 @@ kubectl get configmap golden-cluster-private-resources-configmap -o=custom-colum
[XPK] Pretending all the jobs succeeded
[XPK] Create or delete node pool request complete.
[XPK] Creating ConfigMap for cluster
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
[XPK] Temp file (8669497cfbe494756d36922054f924d7dca463141f0e5d0329e517c880cf2f06) content:
kind: ConfigMap
apiVersion: v1
Expand Down
10 changes: 2 additions & 8 deletions recipes/Cluster_create_sub-slicing.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,8 @@ $ SUB_SLICING_ENABLED=true xpk cluster create --project=golden-project --zone=us
[XPK] Starting xpk v0.0.0
[XPK] Starting cluster create for cluster golden-cluster:
[XPK] Working on golden-project and us-central1-a
[XPK] Task: `Get reservation deployment type` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a --format="value(deploymentType)"
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
[XPK] Task: `Get reservation golden-reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a --format="json(specificReservation,aggregateReservation,status,deploymentType,resourcePolicies)"
[XPK] Task: `Determine server supported GKE versions for default gke version` is implemented by the following command not running since it is a dry run.
gcloud container get-server-config --project=golden-project --region=us-central1 --flatten="channels" --filter="channels.channel=RAPID" --format="value(channels.defaultVersion)"
[XPK] Task: `Determine server supported GKE versions for valid versions` is implemented by the following command not running since it is a dry run.
Expand Down Expand Up @@ -52,8 +50,6 @@ gcloud beta container clusters describe golden-cluster --location us-central1 --
We assume that the underlying system is: SystemCharacteristics(topology='4x4', vms_per_slice=4, gke_accelerator='tpu-v6e-slice', gce_machine_type='ct6e-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='v6e-16', supports_sub_slicing=True, supports_super_slicing=False, supports_accelerator_network_profile=True, docker_platform=<DockerPlatform.AMD: 'linux/amd64'>, requires_workload_policy=False, gpu_config=None, parallel_containers=1)
[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run.
gcloud beta container node-pools list --cluster golden-cluster --project=golden-project --location=us-central1 --format="csv[no-heading](name)"
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
[XPK] Creating 1 node pool or pools of v6e-16
Underlyingly, we assume that means: SystemCharacteristics(topology='4x4', vms_per_slice=4, gke_accelerator='tpu-v6e-slice', gce_machine_type='ct6e-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='v6e-16', supports_sub_slicing=True, supports_super_slicing=False, supports_accelerator_network_profile=True, docker_platform=<DockerPlatform.AMD: 'linux/amd64'>, requires_workload_policy=False, gpu_config=None, parallel_containers=1)
[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run.
Expand All @@ -66,8 +62,6 @@ kubectl get configmap golden-cluster-resources-configmap -o=custom-columns="Conf
[XPK] Pretending all the jobs succeeded
[XPK] Create or delete node pool request complete.
[XPK] Creating ConfigMap for cluster
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
[XPK] Temp file (8d0f4b1e96d79a5d572cbb1a403ac3285b6a9390b6092b86a76bf66705e35d44) content:
kind: ConfigMap
apiVersion: v1
Expand Down
Loading
Loading