Add RabbitMQ version upgrade and queue type migration basic support#526
Add RabbitMQ version upgrade and queue type migration basic support#526lmiccini wants to merge 10 commits intoopenstack-k8s-operators:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lmiccini The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
dfeb678 to
9be78b8
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d9308d8b92f146ef93533d3002d49ade ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 32m 04s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/645d32e739b74209b129928de9e5af66 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 27m 41s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/203560e945fd459095e9862a1dfebfa2 ❌ openstack-k8s-operators-content-provider NODE_FAILURE Node request 100-0008160425 failed in 0s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0565bc902cc540cbbb4a71ec46c402cf ❌ openstack-k8s-operators-content-provider NODE_FAILURE Node request 100-0008160510 failed in 0s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/1940f844c9064cc9ab906819f6a331ab ❌ openstack-k8s-operators-content-provider NODE_FAILURE Node request 100-0008160517 failed in 0s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f45533ae9ce047e08720f67fdfa9bb2d ❌ openstack-k8s-operators-content-provider NODE_FAILURE Node request 100-0008160529 failed in 0s |
451d65d to
397e29b
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/baa1ae88035349b8b7136b862bdae9c9 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 32m 31s |
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
This commit fixes the webhook validation that was blocking automatic queue type migration when upgrading to RabbitMQ 4.0. Changes: 1. Removed strict validation blocking queueType: Mirrored on RabbitMQ 4.x - The validation was running before Default() function - This prevented the automatic override from Mirrored → Quorum - Default() and controller logic handle the enforcement instead 2. Enhanced DefaultForUpdate() to override Mirrored → Quorum - Previously only set default when queueType was nil/empty - Now also overrides when queueType is explicitly set to Mirrored - Only applies when target-version annotation is 4.0+ 3. Updated test expectations - Changed test to verify automatic override instead of rejection - Test now confirms webhook overrides Mirrored → Quorum on 4.0 This allows OpenStackControlPlane to update RabbitMQ instances without validation errors, while still ensuring Quorum queues are enforced on RabbitMQ 4.0 through automatic webhook defaulting.
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
5 similar comments
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/046d2c93850e458a9f6e1896319ffe95 ❌ openstack-k8s-operators-content-provider FAILURE in 9m 55s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/05dc903ed05f4bf8a2dfb29b1ed22e14 ❌ openstack-k8s-operators-content-provider FAILURE in 12m 12s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d04167cc63d941689f1fc0d5d2235734 ❌ openstack-k8s-operators-content-provider FAILURE in 10m 53s |
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/468b712d2bd546b39686e571cb38cb7e ❌ openstack-k8s-operators-content-provider FAILURE in 13m 26s |
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
1 similar comment
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/485010bad94d4e8fb154b5c02bd35832 ❌ openstack-k8s-operators-content-provider FAILURE in 9m 40s |
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
recheck |
Prioritize Spec.QueueType over Status.QueueType when determining the quorum queue setting for RabbitMQ transport URLs. This fixes a race condition where TransportURL reconciles before Status.QueueType is set during RabbitMQ upgrades/recreations (e.g., 3.9→4.0 with storage wipe). Previously, TransportURL would default to quorum=false during the ~13 second window between cluster creation and Status.QueueType update, causing services to create classic queues on RabbitMQ 4.0 clusters configured for quorum queues. When services later reconnected with quorum=true, they would fail with PRECONDITION_FAILED errors. Spec.QueueType is set immediately when the CR is created and represents the configured queue type, making it the reliable source of truth during cluster initialization. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
|
recheck |
When reconciling an existing RabbitMQ 3.9 cluster with a new operator
version that tracks Status.CurrentVersion, the initialization logic
incorrectly prioritized the target-version annotation over detecting
the existing cluster.
Bug scenario:
1. RabbitMQ 3.9 cluster exists (old operator, no CurrentVersion tracking)
2. New operator starts reconciling
3. openstack-operator sets target-version: "4.0" annotation
4. Controller sees annotation and initializes CurrentVersion = "4.0"
5. requiresStorageWipe("4.0", "4.0") returns FALSE
6. Storage wipe is SKIPPED
7. Cluster updates to RabbitMQ 4.0 image with old 3.9 storage
8. RabbitMQ 4.0 fails to boot: "classic_mirrored_queue_version: required feature flag not enabled!"
Root cause:
The initialization logic at lines 203-207 checked for the annotation
FIRST and used it as initialVersion. It only checked for existing
clusters when NO annotation was present.
Fix:
Changed priority order to ALWAYS check for existing RabbitMQCluster
first, regardless of annotation presence:
1. Check if RabbitMQCluster exists
- If exists → initialize CurrentVersion = "3.9" (backwards compat)
- Triggers storage wipe for 3.9→4.0 upgrade
2. If cluster doesn't exist (new deployment)
- Use target-version annotation if present
- Otherwise use DefaultRabbitMQVersion (4.0)
The annotation is the TARGET version (where we want to go), not the
CURRENT version (where we are). We should only use it as initialVersion
for brand new deployments without an existing cluster.
Added test coverage:
- Verifies existing cluster detection works correctly
- Confirms CurrentVersion initializes to "3.9" when cluster exists
- Validates storage wipe is triggered for 3.9→4.0 upgrade
- Ensures upgrade completes successfully with clean storage
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/e96a7b1bf9344b76a7b7ad7b8ed4beb3 ❌ openstack-k8s-operators-content-provider FAILURE in 9m 04s |
…g 3.9→4.0 This commit fixes a critical bug where CurrentVersion was being updated quickly during storage wipe upgrades, making it impossible to observe the intermediate upgrade state and causing test failures. Problem: When a RabbitMQ cluster required a storage wipe for upgrade (e.g., 3.9 → 4.0), the controller was updating CurrentVersion to the target version immediately after storage wipe completion, before the new cluster was even created. This caused: 1. CurrentVersion to not reflect the actually deployed version during upgrade 2. Tests to be unable to observe the upgrade process 3. Race conditions where the old cluster was marked as "ready" instead of the new cluster Solution: 1. Introduced new "WaitingForCluster" upgrade phase to track post-wipe state 2. Deferred CurrentVersion update until the new cluster is actually ready 3. Added 200ms delay after storage wipe before cluster recreation for observability 4. Updated cluster ready logic to detect WaitingForCluster phase and update CurrentVersion only when the new cluster is confirmed ready Controller Changes: - After storage wipe completes, set UpgradePhase = "WaitingForCluster" - Keep CurrentVersion at old version (e.g., "3.9") until new cluster is ready - When cluster becomes ready with UpgradePhase = "WaitingForCluster": - Update CurrentVersion to target version (e.g., "4.0") - Clear UpgradePhase Test Fixes: Updated three tests to wait for WaitingForCluster phase before simulating the new cluster as ready: - "should require storage wipe and update Status.CurrentVersion after upgrade" - "should require storage wipe for downgrade" - "should automatically migrate to Quorum queues and wipe cluster" This ensures CurrentVersion accurately represents the deployed RabbitMQ version throughout the upgrade lifecycle, making upgrades observable and testable. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1805 is needed. |
Jira: https://issues.redhat.com/browse/OSPRH-22219
Depends-On: openstack-k8s-operators/openstack-operator#1805