Skip to content

OCPEDGE-2326: add storageclass configuration options to lvms#1965

Open
Neilhamza wants to merge 16 commits intoopenshift:mainfrom
Neilhamza:storageclassoptions
Open

OCPEDGE-2326: add storageclass configuration options to lvms#1965
Neilhamza wants to merge 16 commits intoopenshift:mainfrom
Neilhamza:storageclassoptions

Conversation

@Neilhamza
Copy link

@Neilhamza Neilhamza commented Jan 21, 2026

this PR completes jiras:
https://issues.redhat.com/browse/OCPEDGE-2326
https://issues.redhat.com/browse/OCPEDGE-2327
https://issues.redhat.com/browse/OCPEDGE-2328

Adds configurable StorageClassOptions at the DeviceClass level in LVMCluster, allowing users to control how LVMS-generated StorageClasses are created while respecting Kubernetes immutability guarantees.

Unit tests and e2e coverage will be added in a follow-up PR.

Changes
API

Added StorageClassOptions to DeviceClass with optional:

reclaimPolicy

volumeBindingMode

additionalParameters

additionalLabels

Webhook

Validates StorageClassOptions on create/update

Warns on conflicts with LVMS-owned parameters and labels

Validates label keys/values per Kubernetes rules

Rejects updates to immutable StorageClass fields after creation

Controller

Uses CreateOrUpdate for StorageClass reconciliation

Sets immutable fields at creation time only

Supports day-2 updates for metadata-only fields (additionalLabels)

Ownership-aware label pruning ensures spec is the source of truth

No delete/recreate behavior for StorageClasses

Immutability Model

StorageClass behavior fields are immutable in Kubernetes and are configurable only at creation time.
Metadata-only fields (labels) are safely reconciled day-2 without impacting existing PVCs.

Example
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
name: lvmcluster
spec:
storage:
deviceClasses:
- name: vg1
storageClassOptions:
reclaimPolicy: Retain
volumeBindingMode: Immediate
additionalParameters:
provisioner-secret-name: my-secret
additionalLabels:
environment: production

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 21, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 21, 2026

@Neilhamza: This pull request references OCPEDGE-2326 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Introduces configurable StorageClass options at the DeviceClass level, enabling users to customize reclaimPolicy, volumeBindingMode, additionalParameters, and additionalLabels. Supports day-2 operations with full backward compatibility.

JIRA: OCPSTRAT-2718

Changes
API (lvmcluster_types.go)

Added StorageClassOptions struct with optional fields for reclaim policy, volume binding mode, parameters, and labels
All fields use pointer types to distinguish unset (nil) from explicitly set values
Validation (lvmcluster_webhook.go)

Added verifyStorageClassOptions() to warn on LVMS-owned parameter conflicts
Validates label key format per Kubernetes standards
Integrated into create/update webhook paths
Controller (topolvm_storageclass.go)

Implements delete+recreate for immutable fields (reclaimPolicy, volumeBindingMode, parameters)
In-place updates for mutable fields (labels)
Added needsRecreation() and applyAdditionalLabels() helpers
Protects LVMS-managed labels and parameters from override
Immutability Handling
Kubernetes StorageClass spec fields are immutable. This PR follows the canonical pattern:

Detect changes to immutable fields
Delete existing StorageClass
Recreate with new configuration
PVC Safety: Existing PVCs unaffected—they store provisioning info at creation and don't depend on the StorageClass object afterward. Brief <1s window where new PVC creation may fail during recreation.

Example

apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
name: lvmcluster
spec:
storage:
deviceClasses:

  • name: vg1
    storageClassOptions:
    reclaimPolicy: Retain
    volumeBindingMode: Immediate
    additionalParameters:
    provisioner-secret-name: my-secret
    additionalLabels:
    environment: production

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Neilhamza Neilhamza changed the title OCPEDGE-2326: add storageclass configuration options to lvms WIP: OCPEDGE-2326: add storageclass configuration options to lvms Jan 21, 2026
@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 21, 2026
@openshift-ci openshift-ci bot requested review from eggfoobar and jerpeter1 January 21, 2026 11:45
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Neilhamza
Once this PR has been reviewed and has the lgtm label, please assign jerpeter1 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 21, 2026

@Neilhamza: This pull request references OCPEDGE-2326 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Introduces configurable StorageClass options at the DeviceClass level, enabling users to customize reclaimPolicy, volumeBindingMode, additionalParameters, and additionalLabels. Supports day-2 operations with full backward compatibility.

Changes
API (lvmcluster_types.go)

Added StorageClassOptions struct with optional fields for reclaim policy, volume binding mode, parameters, and labels
(lvmcluster_webhook.go)

Added verifyStorageClassOptions() to warn on LVMS-owned parameter conflicts
Validates label key format per Kubernetes standards
Integrated into create/update webhook paths
Controller (topolvm_storageclass.go)

Implements delete+recreate for immutable fields (reclaimPolicy, volumeBindingMode, parameters)
In-place updates for mutable fields (labels)
Added needsRecreation() and applyAdditionalLabels() helpers
Protects LVMS-managed labels and parameters from override
Immutability Handling
Kubernetes StorageClass spec fields are immutable. This PR follows the canonical pattern:

Detect changes to immutable fields
Delete existing StorageClass
Recreate with new configuration
PVC Safety: Existing PVCs unaffected—they store provisioning info at creation and don't depend on the StorageClass object afterward. Brief <1s window where new PVC creation may fail during recreation.

Example

apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
name: lvmcluster
spec:
storage:
deviceClasses:

  • name: vg1
    storageClassOptions:
    reclaimPolicy: Retain
    volumeBindingMode: Immediate
    additionalParameters:
    provisioner-secret-name: my-secret
    additionalLabels:
    environment: production

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 21.26866% with 211 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.83%. Comparing base (0561e12) to head (0e4a655).
⚠️ Report is 57 commits behind head on main.

Files with missing lines Patch % Lines
...ollers/lvmcluster/resource/topolvm_storageclass.go 0.00% 82 Missing ⚠️
internal/controllers/lvmcluster/controller.go 19.00% 75 Missing and 6 partials ⚠️
api/v1alpha1/lvmcluster_webhook.go 47.50% 37 Missing and 5 partials ⚠️
internal/controllers/lvmcluster/resource/utils.go 0.00% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1965      +/-   ##
==========================================
- Coverage   48.11%   46.83%   -1.28%     
==========================================
  Files          52       52              
  Lines        3735     3978     +243     
==========================================
+ Hits         1797     1863      +66     
- Misses       1782     1949     +167     
- Partials      156      166      +10     
Files with missing lines Coverage Δ
api/v1alpha1/lvmcluster_types.go 100.00% <ø> (ø)
internal/controllers/lvmcluster/resource/utils.go 0.00% <0.00%> (ø)
api/v1alpha1/lvmcluster_webhook.go 71.22% <47.50%> (-7.09%) ⬇️
internal/controllers/lvmcluster/controller.go 44.98% <19.00%> (-7.96%) ⬇️
...ollers/lvmcluster/resource/topolvm_storageclass.go 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Neilhamza Neilhamza force-pushed the storageclassoptions branch 2 times, most recently from 8a06966 to bd2bf5a Compare January 21, 2026 12:19
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 21, 2026

@Neilhamza: This pull request references OCPEDGE-2326 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

this PR completes jiras:
https://issues.redhat.com/browse/OCPEDGE-2326
https://issues.redhat.com/browse/OCPEDGE-2327
https://issues.redhat.com/browse/OCPEDGE-2328

unit testing and e2e will be done in an additional PR

Introduces configurable StorageClass options at the DeviceClass level, enabling users to customize reclaimPolicy, volumeBindingMode, additionalParameters, and additionalLabels. Supports day-2 operations with full backward compatibility.

Changes
API (lvmcluster_types.go)

Added StorageClassOptions struct with optional fields for reclaim policy, volume binding mode, parameters, and labels
(lvmcluster_webhook.go)

Added verifyStorageClassOptions() to warn on LVMS-owned parameter conflicts
Validates label key format per Kubernetes standards
Integrated into create/update webhook paths
Controller (topolvm_storageclass.go)

Implements delete+recreate for immutable fields (reclaimPolicy, volumeBindingMode, parameters)
In-place updates for mutable fields (labels)
Added needsRecreation() and applyAdditionalLabels() helpers
Protects LVMS-managed labels and parameters from override
Immutability Handling
Kubernetes StorageClass spec fields are immutable. This PR follows the canonical pattern:

Detect changes to immutable fields
Delete existing StorageClass
Recreate with new configuration
PVC Safety: Existing PVCs unaffected—they store provisioning info at creation and don't depend on the StorageClass object afterward. Brief <1s window where new PVC creation may fail during recreation.

Example

apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
name: lvmcluster
spec:
storage:
deviceClasses:

  • name: vg1
    storageClassOptions:
    reclaimPolicy: Retain
    volumeBindingMode: Immediate
    additionalParameters:
    provisioner-secret-name: my-secret
    additionalLabels:
    environment: production

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Neilhamza Neilhamza force-pushed the storageclassoptions branch from fc7f998 to d8a2f74 Compare January 21, 2026 14:15
@Neilhamza Neilhamza changed the title WIP: OCPEDGE-2326: add storageclass configuration options to lvms OCPEDGE-2326: add storageclass configuration options to lvms Jan 22, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2026
@qJkee
Copy link
Contributor

qJkee commented Jan 22, 2026

/сс

@Neilhamza Neilhamza changed the title OCPEDGE-2326: add storageclass configuration options to lvms WIP: OCPEDGE-2326: add storageclass configuration options to lvms Jan 22, 2026
@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 22, 2026
@@ -117,6 +118,57 @@ func DeleteResource(ctx context.Context, obj client.Object) {
}, timeout, interval).WithContext(ctx).Should(Satisfy(k8serrors.IsNotFound))
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForStorageCleanup prevents race where LVM metadata flush is async: vgremove/lvremove
return before disk flush completes, causing next test to find stale metadata (VG exists,
thin-pool missing) and mark cluster Degraded. Observed in HyperShift CI job.

@Neilhamza
Copy link
Author

/retest

@Neilhamza
Copy link
Author

/retest list

@Neilhamza
Copy link
Author

/test e2e-aws-hypershift

1 similar comment
@Neilhamza
Copy link
Author

/test e2e-aws-hypershift

@Neilhamza Neilhamza changed the title WIP: OCPEDGE-2326: add storageclass configuration options to lvms OCPEDGE-2326: add storageclass configuration options to lvms Jan 25, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 25, 2026
Signed-off-by: nhamza <nhamza@redhat.com>
Signed-off-by: nhamza <nhamza@redhat.com>
Signed-off-by: nhamza <nhamza@redhat.com>
Signed-off-by: nhamza <nhamza@redhat.com>
Signed-off-by: nhamza <nhamza@redhat.com>
Signed-off-by: nhamza <nhamza@redhat.com>
Signed-off-by: nhamza <nhamza@redhat.com>
@Neilhamza Neilhamza force-pushed the storageclassoptions branch from d1dd8d6 to b8ef8f6 Compare January 26, 2026 11:51
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 26, 2026
Signed-off-by: nhamza <nhamza@redhat.com>
@Neilhamza Neilhamza force-pushed the storageclassoptions branch from b8ef8f6 to 7c139c0 Compare January 26, 2026 12:11
@Neilhamza Neilhamza changed the title [WIP] OCPEDGE-2326: add storageclass configuration options to lvms OCPEDGE-2326: add storageclass configuration options to lvms Jan 26, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 26, 2026
}

scName := constants.StorageClassPrefix + newDC.Name
sc := &storagev1.StorageClass{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you use it? I can't find anywhere

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sc variable is required for the Get() call right after, which is used to check whether the StorageClass already exists.
This existence check gates the immutability validation:
– If the StorageClass does not exist yet (NotFound), immutability checks are skipped
– If the StorageClass exists, updates to storageClassOptions are rejected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it can be created during this webhook call. Basically we need to check old and new CR configuration, not with existing objects. The cluster will be reconciled later accordingly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely rewrote the immutability check in api/v1alpha1/lvmcluster_webhook.go:

removed:

  • The StorageClass GET call (the r.Get(ctx, types.NamespacedName{Name: scName}, sc)
  • Any dependency on checking the actual StorageClass object in the cluster

added:

  1. immutableSCFields struct (lines 623-629) - holds only the immutable fields we care about
  2. effectiveImmutable() helper (lines 631-661) - extracts immutable fields from a DeviceClass and applies defaults (so nil -> WaitForFirstConsumer, nil -> Delete, etc.)
  3. verifyImmutableStorageClassFields() rewrite (lines 663-693) - now compares old CR spec vs new CR spec only, not cluster objects:

return sc
}

func (s topolvmStorageClass) applyAdditionalLabels(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Why do we need to store additional labels in annotation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

he annotation tracks which labels were added by LVMS via additionalLabels so they
can be safely pruned when removed from the spec, without touching user's own labels.

Example scenario:

  1. User adds label "env=prod" via additionalLabels then annotation stores "env"
  2. User removes "env=prod" from spec then we check annotation, see "env" was LVMS-managed
  3. We safely delete "env" label from StorageClass
  4. Any labels NOT in the annotation (user's own labels) are left untouched

This prevents orphaned labels when users modify their additionalLabels configuration.
See applyAdditionalLabels() lines 258-263 for pruning logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if annotation LVMS managed - user is unable to modify it. It's our object, we manage it, and we allow user to modify only certain fields

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now (current implemntation) LVMS uses a partial ownership model for StorageClass labels: we only manage the keys coming from additionalLabels, and the annotation tracks which keys LVMS previously applied so we can safely prune them when removed from the spec without touching unrelated labels added by admins or external tooling.

The alternative (to implement your suggestion) would be strict ownership, where LVMS overwrites the entire labels map from the CR on every reconcile (no annotation tracking needed), but that would drop any non-LVMS labels set directly on the StorageClass.
The first approach keeps day-2 reconciliation safe while avoiding unintended deletion of user/policy-applied labels.

can i have your suggestion on how to proceed?

Signed-off-by: nhamza <nhamza@redhat.com>
Signed-off-by: nhamza <nhamza@redhat.com>
@Neilhamza
Copy link
Author

/retest

Signed-off-by: nhamza <nhamza@redhat.com>
@Neilhamza
Copy link
Author

/retest Red Hat Konflux / lvm-operator-integration-tests-4-21 / lvm-operator-4-21

@Neilhamza
Copy link
Author

/test Red Hat Konflux / lvm-operator-integration-tests-4-21 / lvm-operator-4-21

Signed-off-by: nhamza <nhamza@redhat.com>
@Neilhamza
Copy link
Author

/retest

1 similar comment
@Neilhamza
Copy link
Author

/retest

Signed-off-by: nhamza <nhamza@redhat.com>
@Neilhamza Neilhamza requested a review from qJkee February 3, 2026 08:20
}

for _, deviceClass := range l.Spec.Storage.DeviceClasses {
if deviceClass.StorageClassOptions == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if they were exist and now deleted? Will values be defaulted and operator might try to change immutable parameters and fail?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added effectiveImmutable() helper that applies defaults to both old and new specs before comparison, catching nil -> defaults immutability violations.

}

scName := constants.StorageClassPrefix + newDC.Name
sc := &storagev1.StorageClass{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it can be created during this webhook call. Basically we need to check old and new CR configuration, not with existing objects. The cluster will be reconciled later accordingly

continue
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to check if deletion timestamp is not nil, otherwise delete will return error

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added if !pv.DeletionTimestamp.IsZero() { continue } check before attempting delete in cleanup function.

}

// Delete LogicalVolumes
lvList := &topolvmv1.LogicalVolumeList{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topolvm logical volumes managed by topolvm controllers and will be deleted automatically when lvmcluster is being deleted

Copy link
Author

@Neilhamza Neilhamza Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the most tricky part, this function is only called when we are in retain policy. if we dont manually delete LogicalVolumes the user cannot delete lvm cluster because it will simply hang - since there will be LV's leftover (which is retain..)

i can either go with the option to leave the code as is, or block the user from deleting lvm-cluster and send a message that he must manually delete retained volumes (and finalizes)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i updated code by adding multiple gates that checks if we are in/touching retain related
Kept LV deletion scoped only to Retain deviceClasses with bounded wait. Without this, LV finalizers block deletion after TopoLVM controller is removed.


// Only set immutable spec fields on creation (ResourceVersion is empty for new objects)
if sc.ResourceVersion == "" {
sc.Provisioner = desired.Provisioner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, do we need this? We do query actual StorageClass in s.getTopolvmStorageClasses. Can't we check for immutable fields there so we won't override them using actual StorageClass? This would also reduce this function size and avoid duplication in construction of StorageClass

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need this guard because getTopolvmStorageClasses() builds the desired spec, but CreateOrUpdate will re-run the mutate function on every reconcile, without the ResourceVersion == "" check we could accidentally try to update immutable SC fields. This makes sure immutable fields are only set on create, so we don’t accidentally try to change them on later reconciles

return sc
}

func (s topolvmStorageClass) applyAdditionalLabels(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if annotation LVMS managed - user is unable to modify it. It's our object, we manage it, and we allow user to modify only certain fields

Signed-off-by: nhamza <nhamza@redhat.com>
@Neilhamza Neilhamza requested a review from qJkee February 5, 2026 09:48
@Neilhamza
Copy link
Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 5, 2026

@Neilhamza: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/snyk-deps 0e4a655 link false /test snyk-deps

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants