Skip to content

Conversation

@LNSD
Copy link
Contributor

@LNSD LNSD commented Jan 14, 2026

Remove Arc dependency from catalog types, storing only the
essential fields (HashReference, start_block, table name, network).

  • Replace ResolvedTable.dataset: Arc<Dataset> with individual fields
  • Remove Dataset::resolved_tables() method
  • Update PhysicalTable constructors to take Table instead of ResolvedTable
  • Store dataset_reference: HashReference directly in PhysicalTable
  • Update callers in dump, admin-api, and tests

@LNSD LNSD requested a review from Theodus January 14, 2026 14:31
@LNSD LNSD self-assigned this Jan 14, 2026
@LNSD LNSD requested a review from JohnSwan1503 January 14, 2026 14:32
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR successfully decouples PhysicalTable and ResolvedTable from Arc<Dataset>, replacing the direct dataset reference with individual fields (HashReference, start_block, table name, network). This is a well-structured refactoring that reduces coupling and makes the data flow more explicit.

Highlights

Positive changes:

  • Clean separation of concerns - PhysicalTable and ResolvedTable now store only the data they need
  • Removal of Dataset::resolved_tables() method simplifies the API
  • Good use of BTreeSet for deduplicating dataset references in streaming_query.rs

Key items to address

  1. Breaking API change (restore.rs): The url field renamed to path in RestoredTableInfo is a breaking change for HTTP clients. Please confirm this is intentional.

  2. Performance consideration (physical.rs): catalog_schema() and table_ref() now return owned types instead of references, causing allocations on each call. Consider caching if these are hot paths.

  3. Data duplication (physical.rs): PhysicalTable now stores both table: Table and extracted fields (table_name, network). While acceptable given Table appears immutable, this could be documented.

  4. Logging change (parquet_writer.rs): The trace span changed from compact to full table reference - verify this is intentional.

Minor suggestions

  • Consistency in using dataset_reference vs dataset.reference() in some files
  • Removal of network() and schema() accessors from ResolvedTable changes the caller interface

Overall, this is a solid refactoring. The inline comments have specific suggestions for the items above.

@LNSD LNSD force-pushed the lnsd/refactor-common-use-dataset-tables branch 3 times, most recently from 8944d17 to 317f1ac Compare January 15, 2026 19:48
Reduce memory overhead and break tight coupling by storing only essential dataset fields in catalog types instead of full `Arc<Dataset>` references.

- Eliminate `Arc<Dataset>` dependency in `ResolvedTable` and `PhysicalTable` to reduce memory footprint
- Store only required dataset metadata (reference, start block, network) in catalog types
- Remove convenience method for creating resolved tables, requiring explicit construction at call sites
- Simplify `PhysicalTable` by directly storing table definition instead of wrapping through `ResolvedTable`
- Move table restoration logic from method to free function for better composability

Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>
@LNSD LNSD force-pushed the lnsd/refactor-common-use-dataset-tables branch from 1436b0c to 9687e9e Compare January 15, 2026 23:15
@LNSD LNSD merged commit b7dffbb into main Jan 15, 2026
8 checks passed
@LNSD LNSD deleted the lnsd/refactor-common-use-dataset-tables branch January 15, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants