Migrate from SQLAlchemy to Ibis #62

daniel-thom · 2026-01-19T16:37:35Z

This is a prototype, mostly generated by Claude. The goal is to see if we can simplify interaction with dsgrid when running Spark jobs. The current code based on SQLAlchemy requires a separate Spark session: dsgrid creates a session with pyspark and chronify relies on an Apache Thrift Server (Hive).

We would get the following benefits by migrating:

Pass an Ibis Table object from dsgrid to chronify to perform time validation instead of only a path to a Parquet file.
Support of PyHive (the SQLAlchemy driver for Hive) is unclear.
dsgrid could drop much of its special handling for Spark vs DuckDB (DuckDB has an experimental Spark API, but it is incomplete and has an uncertain future). Ibis appears to be a better long term solution.

We would lose this functionality in SQLAlchemy:

Database transactions with rollback. Ibis does not support this natively.
We currently allow the user to ingest rows from multiple DataFrames into an existing table. If the first DataFrame is valid but the second is not, we perform a rollback and the state of the database is the same as the original state. With Ibis, we do not have code to delete the added rows. (It could be done with special-casing for backends that support it.)
This is not import for dsgrid as we do not ingest data like this. We need to ask other chronify users.

Outstanding work:

Some tests are failing due to time zone / DST handling with Spark.
Talk to other chronify users about dropping transaction support.

codecov-commenter · 2026-01-20T01:11:15Z

Codecov Report

❌ Patch coverage is 87.17625% with 203 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (ll/local_time2@4f2a3f3). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/chronify/ibis/spark_backend.py	73.01%	34 Missing ⚠️
src/chronify/ibis/functions.py	77.61%	30 Missing ⚠️
src/chronify/ibis/types.py	51.92%	25 Missing ⚠️
src/chronify/store.py	88.00%	21 Missing ⚠️
src/chronify/time_series_mapper_base.py	86.61%	19 Missing ⚠️
src/chronify/ibis/duckdb_backend.py	79.74%	16 Missing ⚠️
src/chronify/ibis/sqlite_backend.py	80.72%	16 Missing ⚠️
src/chronify/ibis/base.py	80.00%	14 Missing ⚠️
src/chronify/time_zone_localizer.py	82.75%	5 Missing ⚠️
tests/conftest.py	87.80%	5 Missing ⚠️
... and 9 more

Additional details and impacted files

@@                Coverage Diff                @@
##             ll/local_time2      #62   +/-   ##
=================================================
  Coverage                  ?   91.85%           
=================================================
  Files                     ?       55           
  Lines                     ?     4962           
  Branches                  ?        0           
=================================================
  Hits                      ?     4558           
  Misses                    ?      404           
  Partials                  ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lixiliu · 2026-01-24T20:51:13Z

This is a prototype, mostly generated by Claude. The goal is to see if we can simplify interaction with dsgrid when running Spark jobs. The current code based on SQLAlchemy requires a separate Spark session: dsgrid creates a session with pyspark and chronify relies on an Apache Thrift Server (Hive).

We would get the following benefits by migrating:

Pass an Ibis Table object from dsgrid to chronify to perform time validation instead of only a path to a Parquet file.

Support of PyHive (the SQLAlchemy driver for Hive) is unclear.

dsgrid could drop much of its special handling for Spark vs DuckDB (DuckDB has an experimental Spark API, but it is incomplete and has an uncertain future). Ibis appears to be a better long term solution.

We would lose this functionality in SQLAlchemy:

Database transactions with rollback. Ibis does not support this natively.

We currently allow the user to ingest rows from multiple DataFrames into an existing table. If the first DataFrame is valid but the second is not, we perform a rollback and the state of the database is the same as the original state. With Ibis, we do not have code to delete the added rows. (It could be done with special-casing for backends that support it.)

This is not import for dsgrid as we do not ingest data like this. We need to ask other chronify users.

Outstanding work:

Some tests are failing due to time zone / DST handling with Spark.

Talk to other chronify users about dropping transaction support.

For rollback behavior, can't we just make a copy of the original dataframe to do the next operation so we have something to fall back on?

lixiliu

I am fine with switching to Ibis for its ability to avoid the ways we have to handle different backend now. There's less changes to the code structure that I expected.

I think Claude is overcomplicating the mapping logic a bit and could use some more iteration there.

lixiliu · 2026-01-24T21:01:10Z

src/chronify/time_series_mapper_base.py

        """Convert time columns with from_schema to to_schema configuration."""


+def _ensure_mapping_types_match_source(


good call, but it is redundant with the data type handling in the mapping process later.

lixiliu · 2026-01-24T21:02:52Z

src/chronify/time_series_mapper_base.py

+    df_mapping = _ensure_mapping_types_match_source(df_mapping, from_schema, backend)
+
+    # Debug: Print mapping DF around target time
+    try:


What is this??

lixiliu · 2026-01-24T21:09:54Z

src/chronify/time_series_mapper_base.py

+    if left_type is None or right_type is None:
+        return left_col == right_col
+
+    left_is_unknown = not hasattr(left_type, "is_timestamp") or str(left_type).startswith(


Feels incomplete and inconsistent.

Generally we only want to change the right_table key because left_table is the input data, right_table is the mapping table. IMO it's safer to have the right key conform to the left key only.

lixiliu · 2026-01-24T21:35:53Z

src/chronify/time_series_mapper_base.py

-    if resampling_operation:
-        query = query.group_by(*groupby_stmt)
+    predicates = _build_join_predicates(left_table, right_table, keys)
+    joined = left_table.join(right_table, predicates)


These join operations are clean, but the rest seems unnecessarily complicated.

I think Claude tried to preserve the existing way of handling potential column conflicts, which is to split the final columns into left_columns, right_columns, and joined_columns, but it also wrote additional code to handle potential conflicts all over again. That's why _build_select_columns and _build_query take in so many input variables and so complicated.

daniel-thom added 4 commits January 19, 2026 09:20

Replace sqlalchemy with ibis

a28ef3c

Fix Spark time zone handling

90d955c

Refactor

314e6f7

Refactor

9d47b9a

mypy fixes

cf105fa

lixiliu reviewed Jan 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate from SQLAlchemy to Ibis #62

Migrate from SQLAlchemy to Ibis #62

Uh oh!

daniel-thom commented Jan 19, 2026

Uh oh!

codecov-commenter commented Jan 20, 2026 •

edited

Loading

Uh oh!

lixiliu commented Jan 24, 2026

Uh oh!

lixiliu left a comment

Uh oh!

lixiliu Jan 24, 2026

Uh oh!

lixiliu Jan 24, 2026

Uh oh!

lixiliu Jan 24, 2026

Uh oh!

lixiliu Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		"""Convert time columns with from_schema to to_schema configuration."""


		def _ensure_mapping_types_match_source(

Migrate from SQLAlchemy to Ibis #62

Are you sure you want to change the base?

Migrate from SQLAlchemy to Ibis #62

Uh oh!

Conversation

daniel-thom commented Jan 19, 2026

Uh oh!

codecov-commenter commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lixiliu commented Jan 24, 2026

Uh oh!

lixiliu left a comment

Choose a reason for hiding this comment

Uh oh!

lixiliu Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

lixiliu Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

lixiliu Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

lixiliu Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Jan 20, 2026 •

edited

Loading