-
Notifications
You must be signed in to change notification settings - Fork 10
feat: cluster API support for Verda Cloud Python SDK #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
huksley
wants to merge
10
commits into
master
Choose a base branch
from
feat/cluster-api
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
0d0e7ab
Add Clusters API wrapper
claude d6b9918
feat: clusters api
814d02e
fix: unit test
99668e8
fix: polishing
7f86615
fix: format, lint and unit test fixing
471e089
fix: integration tests
f009010
fix: review fixes
d2c3a04
fix: full features cluster example, add integration test
f5c275c
fix: unit tests
7f9a1c5
fix: revert to the correct OS images in prod
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| """ | ||
| Example demonstrating how to use the Clusters API. | ||
|
|
||
| This example shows how to: | ||
| - Create a new compute cluster | ||
| - List all clusters | ||
| - Get a specific cluster by ID | ||
| - Get cluster nodes | ||
| - Delete a cluster | ||
| """ | ||
|
|
||
| import os | ||
| import time | ||
|
|
||
| from verda import VerdaClient | ||
| from verda.constants import Actions, Locations | ||
|
|
||
| # Get credentials from environment variables | ||
| CLIENT_ID = os.environ.get('VERDA_CLIENT_ID') | ||
| CLIENT_SECRET = os.environ.get('VERDA_CLIENT_SECRET') | ||
| BASE_URL = os.environ.get('VERDA_BASE_URL', 'https://api.verda.com/v1') | ||
|
|
||
| # Create client | ||
| verda = VerdaClient(CLIENT_ID, CLIENT_SECRET, base_url=BASE_URL) | ||
|
|
||
|
|
||
| def create_cluster_example(): | ||
| """Create a new compute cluster.""" | ||
| # Get SSH keys | ||
| ssh_keys = [key.id for key in verda.ssh_keys.get()] | ||
|
|
||
| # Check if cluster type is available | ||
| if not verda.clusters.is_available('16B200', Locations.FIN_03): | ||
| raise ValueError('Cluster type 16B200 is not available in FIN_03') | ||
|
|
||
| # Get available images for cluster type | ||
| images = verda.clusters.get_cluster_images('16B200') | ||
| if 'ubuntu-22.04-cuda-12.9-cluster' not in images: | ||
| raise ValueError('Ubuntu 22.04 CUDA 12.9 cluster image is not supported for 16B200') | ||
|
|
||
| # Create a 16B200 cluster | ||
| cluster = verda.clusters.create( | ||
| hostname='my-compute-cluster', | ||
| cluster_type='16B200', | ||
| image='ubuntu-22.04-cuda-12.9-cluster', | ||
| description='Example compute cluster for distributed training', | ||
| ssh_key_ids=ssh_keys, | ||
| location=Locations.FIN_03, | ||
| shared_volume_name='my-shared-volume', | ||
| shared_volume_size=30000, | ||
| wait_for_status=None, | ||
| ) | ||
|
|
||
| print(f'Creating cluster: {cluster.id}') | ||
| print(f'Cluster hostname: {cluster.hostname}') | ||
| print(f'Cluster status: {cluster.status}') | ||
| print(f'Cluster cluster_type: {cluster.cluster_type}') | ||
| print(f'Location: {cluster.location}') | ||
|
|
||
| # Wait for cluster to enter RUNNING status | ||
| while cluster.status != verda.constants.cluster_status.RUNNING: | ||
| time.sleep(2) | ||
| print(f'Waiting for cluster to enter RUNNING status... (status: {cluster.status})') | ||
| cluster = verda.clusters.get_by_id(cluster.id) | ||
|
|
||
| print(f'Public IP: {cluster.ip}') | ||
| print('Cluster is now running and ready to use!') | ||
|
|
||
| return cluster | ||
|
|
||
|
|
||
| def list_clusters_example(): | ||
| """List all clusters.""" | ||
| # Get all clusters | ||
| clusters = verda.clusters.get() | ||
|
|
||
| print(f'\nFound {len(clusters)} cluster(s):') | ||
| for cluster in clusters: | ||
| print( | ||
| f' - {cluster.hostname} ({cluster.id}): {cluster.status} - {len(cluster.worker_nodes)} nodes' | ||
| ) | ||
|
|
||
| # Get clusters with specific status | ||
| running_clusters = verda.clusters.get(status=verda.constants.cluster_status.RUNNING) | ||
| print(f'\nFound {len(running_clusters)} running cluster(s)') | ||
|
|
||
| return clusters | ||
|
|
||
|
|
||
| def get_cluster_by_id_example(cluster_id: str): | ||
| """Get a specific cluster by ID.""" | ||
| cluster = verda.clusters.get_by_id(cluster_id) | ||
|
|
||
| print('\nCluster details:') | ||
| print(f' ID: {cluster.id}') | ||
| print(f' Name: {cluster.hostname}') | ||
| print(f' Description: {cluster.description}') | ||
| print(f' Status: {cluster.status}') | ||
| print(f' Cluster type: {cluster.cluster_type}') | ||
| print(f' Created at: {cluster.created_at}') | ||
| print(f' Public IP: {cluster.ip}') | ||
| print(f' Worker nodes: {len(cluster.worker_nodes)}') | ||
|
|
||
| return cluster | ||
|
|
||
|
|
||
| def delete_cluster_example(cluster_id: str): | ||
| """Delete a cluster.""" | ||
| print(f'\nDeleting cluster {cluster_id}...') | ||
|
|
||
| verda.clusters.action(cluster_id, Actions.DELETE) | ||
|
|
||
| print('Cluster deleted successfully') | ||
|
|
||
|
|
||
| def main(): | ||
| """Run all cluster examples.""" | ||
| print('=== Clusters API Example ===\n') | ||
|
|
||
| # Create a new cluster | ||
| print('1. Creating a new cluster...') | ||
| cluster = create_cluster_example() | ||
| cluster_id = cluster.id | ||
|
|
||
| # List all clusters | ||
| print('\n2. Listing all clusters...') | ||
| list_clusters_example() | ||
|
|
||
| # Get cluster by ID | ||
| print('\n3. Getting cluster details...') | ||
| get_cluster_by_id_example(cluster_id) | ||
|
|
||
| # Delete the cluster | ||
| print('\n6. Deleting the cluster...') | ||
| delete_cluster_example(cluster_id) | ||
|
|
||
| print('\n=== Example completed successfully ===') | ||
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| import logging | ||
| import os | ||
|
|
||
| import pytest | ||
|
|
||
| from verda import VerdaClient | ||
| from verda.constants import Locations | ||
|
|
||
| logging.basicConfig(level=logging.DEBUG) | ||
| logger = logging.getLogger() | ||
|
|
||
|
|
||
| IN_GITHUB_ACTIONS = os.getenv('GITHUB_ACTIONS') == 'true' | ||
|
|
||
|
|
||
| @pytest.mark.skipif(IN_GITHUB_ACTIONS, reason="Test doesn't work in Github Actions.") | ||
| @pytest.mark.withoutresponses | ||
| class TestClusters: | ||
| def test_create_cluster(self, verda_client: VerdaClient): | ||
| # get ssh key | ||
| ssh_key = verda_client.ssh_keys.get()[0] | ||
|
|
||
| if not verda_client.clusters.is_available('16B200', Locations.FIN_03): | ||
| raise ValueError('Cluster type 16B200 is not available in FIN_03') | ||
| logger.debug('[x] Cluster type 16B200 is available in FIN_03') | ||
|
|
||
| availabilities = verda_client.clusters.get_availabilities(Locations.FIN_03) | ||
| assert len(availabilities) > 0 | ||
| assert '16B200' in availabilities | ||
| logger.debug( | ||
| '[x] Cluster type 16B200 is one of the available cluster types in FIN_03: %s', | ||
| availabilities, | ||
| ) | ||
|
|
||
| images = verda_client.clusters.get_cluster_images('16B200') | ||
| assert len(images) > 0 | ||
| assert 'ubuntu-22.04-cuda-12.9-cluster' in images | ||
| logger.debug('[x] Ubuntu 22.04 CUDA 12.9 cluster image is supported for 16B200') | ||
|
|
||
| # create instance | ||
| cluster = verda_client.clusters.create( | ||
| hostname='test-instance', | ||
| location=Locations.FIN_03, | ||
| cluster_type='16B200', | ||
| description='test instance', | ||
| image='ubuntu-22.04-cuda-12.9-cluster', | ||
| ssh_key_ids=[ssh_key.id], | ||
| # Set to None to not wait for provisioning but return immediately | ||
| wait_for_status=verda_client.constants.cluster_status.PROVISIONING, | ||
| ) | ||
|
|
||
| # assert instance is created | ||
| assert cluster.id is not None | ||
| assert ( | ||
| cluster.status == verda_client.constants.cluster_status.PROVISIONING | ||
| or cluster.status == verda_client.constants.cluster_status.RUNNING | ||
| ) | ||
|
|
||
| # If still provisioning, we don't have worker nodes yet and ip is not available | ||
| if cluster.status != verda_client.constants.instance_status.PROVISIONING: | ||
huksley marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| assert cluster.worker_nodes is not None | ||
| assert len(cluster.worker_nodes) == 2 | ||
| assert cluster.ip is not None | ||
|
|
||
| # Now we need to wait for RUNNING status to connect to the jumphost (public IP is available) | ||
| # After that, we can connect to the jumphost and run commands on the cluster nodes: | ||
| # | ||
| # ssh -i ssh_key.pem root@<public_ip> | ||
| # | ||
Empty file.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe nicer to use the
deletemethod (or both)