-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: populate manifest created/replaced/kept count when commit a snapshot #15003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9ebf11b to
62e148e
Compare
|
@nastra @amogh-jahagirdar if you can help take a look |
|
also @huaxingao and @stevenzwu, if you are interested to take a look. |
c06a28e to
f694c31
Compare
f694c31 to
131cd47
Compare
| * Returns the count of manifests that were replaced (rewritten) during filtering. | ||
| * | ||
| * <p>A manifest is considered replaced when a new manifest was created to replace the original | ||
| * one (i.e., the original manifest != filtered manifest). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note because of the normal append path in merging snapshot producer this can also be original manifest != appended manifest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah nvm, didn't see we were in manifest filter manager
|
I'm generally +1 on this idea but don't have time to do a full review right now, I do think we should consider using "existing" instead of "kept"? Or maybe skip it all together since I think we aren't tracking manifests which are not scanned in the first place right? |
Thanks @RussellSpitzer , discussed offline as we reuse the SnapshotSummary already defined in https://iceberg.apache.org/spec/#optional-snapshot-summary-fields. Previously we only populate such for rewrite-manifest operation, this change I want to introduce them for all commits result in a new snapshot, like append, row-delta and delete etc |
|
You are totally right! Better to keep with the original definitions then |
Currently iceberg commit does not expose the manifest level information, so it's difficult to see how metadata will evolve with given snapshot.
Now snapshot summary will also include following information about the table commit
manifests-created: new manifest was created by the same snapshotmanifests-replaced: manifest was filtered (in-place rewrite)/merged by the merging snapshot producermanifest-kept: existing manifest was carried over where it's created by the older snapshot prior to the commit