Namanmahor/plat 265 support file size info reading by NamanMahor · Pull Request #8663 · rilldata/rill

NamanMahor · 2026-01-16T18:29:04Z

PLAT-265: Support file size info reading

Checklist:

Covered by tests
Ran it and it works as intended
Reviewed the diff before requesting a review
Checked for unhandled edge cases
Linked the issues it closes
Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
Intend to cherry-pick into the release branch
I'm proud of this work!

…upport-file-size-info-reading

proto/rill/runtime/v1/connectors.proto

begelundmuller · 2026-01-28T17:43:57Z

runtime/pkg/blob/download.go

+	entries, err := pagination.CollectAll(ctx,
+		func(ctx context.Context, pz uint32, tk string) ([]drivers.ObjectStoreEntry, string, error) {
+			return b.ListObjectsForGlob(ctx, opts.Glob, pz, tk)
+		},
+		1000)


Does this add new network overhead?

I'm wondering if it would be better to still also support full listings in ListObjectsForGlob (e.g. pageSize == 0 could mean return all).

No, there is no new network overhead because List also use ListPage internally with pagesize 1000 for all s3,gcs and azure.

Our all other listing apis have default page size if pagesize is zero. changing it for this api feels inconsistent.

Our all other listing apis have default page size if pagesize is zero. changing it for this api feels inconsistent.

But this isn't an API, this is an internal function, right? Maybe I'm missing something.

Usually we try to have default values only at the "outer" level, i.e. in the API handlers, but require explicit config for inner functions.

@begelundmuller is it okay to handle it in separately PR? we will need to change to other ListObject and ListBuckets too with this ListBucketsForGlobs

begelundmuller · 2026-01-28T17:48:33Z

runtime/pkg/blob/bucket.go

+				// Handle GCS
+				var q *storage.Query
+				if as(&q) {
+					// Only fetch the fields we need.
+					_ = q.SetAttrSelection([]string{"Name", "Size", "Created", "Updated"})
+					if startAfter != "" {
+						q.StartOffset = startAfter
+					}
+				}
+				// Handle S3
+				var s3Input *s3.ListObjectsV2Input
+				if as(&s3Input) {
+					if startAfter != "" {
+						s3Input.StartAfter = aws.String(startAfter)
+					}
+				}
+				return nil


What about Azure?

azure does not support startAfter or offset. so we have to iterate to skip it.

So does that mean it needs to re-scan the full listing on each page fetch? Assume there is 1M objects, the glob matches all of them, and the page size is 1000 – then how many Azure calls would it end up making?

I'm not sure if this applies here, but this page seems to indicate there's a startAfter option: https://learn.microsoft.com/en-us/rest/api/storageservices/list-blobs?tabs=microsoft-entra-id

If that doesn't work, is there some other offset/page token from Azure we can use?

No. we are using page token so we were not rescan all pages only the last page for startAfter(but now added startAfter too for azure). 1M/1000 calls.

upgraded azure blog storage to latest. now we can use startAfter.

…upport-file-size-info-reading

begelundmuller · 2026-02-06T16:11:40Z

runtime/pkg/blob/bucket.go

+			} else if lastProcessedIdx != -1 {
+				startAfter = retval[lastProcessedIdx].Key
+			}
+			break


When len(entries) == validPageSize, how can lastProcessedIdx not always be len(retval)-1?

Yes — suppose the first call to b.bucket.ListPage returns some objects that are filtered out by the glob pattern. We may need additional calls to collect enough matching entries to reach the requested page size, so we don’t necessarily consume the entire retval page.

begelundmuller · 2026-02-06T16:12:31Z

runtime/pkg/blob/bucket.go

+		driverPageToken = nextDriverPageToken
+		startAfter = ""


What if all paths on the page were before startAfter? Doesn't it then need to keep startAfter so it can skip the necessary objects on the next page?

startAfter is used when we return the page token which is half processed. so then the page is fully read we do not need startAfter next page is read from start.

…upport-file-size-info-reading

NamanMahor added 4 commits December 5, 2025 12:01

api change

fe472e1

partial changes

e801044

support for pagination

0991df2

Merge remote-tracking branch 'origin/main' into namanmahor/plat-265-s…

9ebbb6f

…upport-file-size-info-reading

NamanMahor marked this pull request as draft January 16, 2026 18:29

NamanMahor added 6 commits January 22, 2026 10:25

Merge remote-tracking branch 'origin/main' into namanmahor/plat-265-s…

50b2c57

…upport-file-size-info-reading

pagination support

0be5395

fix test

41455d9

Merge remote-tracking branch 'origin/main' into namanmahor/plat-265-s…

5cd83cb

…upport-file-size-info-reading

make generate.proto

47022a0

more test and improvement

c3e14a0

NamanMahor marked this pull request as ready for review January 23, 2026 08:57

NamanMahor requested review from begelundmuller and k-anshul January 23, 2026 08:57

begelundmuller requested changes Jan 28, 2026

View reviewed changes

NamanMahor added 2 commits February 3, 2026 13:23

review

a4e5543

Merge remote-tracking branch 'origin/main' into namanmahor/plat-265-s…

559af2c

…upport-file-size-info-reading

NamanMahor requested a review from begelundmuller February 3, 2026 07:55

begelundmuller requested changes Feb 6, 2026

View reviewed changes

NamanMahor added 2 commits February 11, 2026 16:41

upgrade azure

ffec6b4

Merge remote-tracking branch 'origin/main' into namanmahor/plat-265-s…

588fddf

…upport-file-size-info-reading

NamanMahor requested a review from begelundmuller February 12, 2026 12:46

Merge remote-tracking branch 'origin/main' into namanmahor/plat-265-s…

f3732e4

…upport-file-size-info-reading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Namanmahor/plat 265 support file size info reading#8663

Namanmahor/plat 265 support file size info reading#8663
NamanMahor wants to merge 15 commits intomainfrom
namanmahor/plat-265-support-file-size-info-reading

NamanMahor commented Jan 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

begelundmuller Jan 28, 2026

Uh oh!

NamanMahor Feb 3, 2026

Uh oh!

begelundmuller Feb 6, 2026

Uh oh!

NamanMahor Feb 12, 2026

Uh oh!

begelundmuller Jan 28, 2026

Uh oh!

NamanMahor Feb 3, 2026

Uh oh!

begelundmuller Feb 6, 2026

Uh oh!

NamanMahor Feb 9, 2026

Uh oh!

begelundmuller Feb 6, 2026

Uh oh!

NamanMahor Feb 11, 2026

Uh oh!

begelundmuller Feb 6, 2026

Uh oh!

NamanMahor Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NamanMahor commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NamanMahor commented Jan 16, 2026 •

edited

Loading