Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
34ea689
chore: add script to regenerate golden files for plan stability tests
andygrove Jan 16, 2026
6d43d52
docs: update contributor guide to reference golden files script
andygrove Jan 16, 2026
ee2ffec
fix: ensure native code is built before installing
andygrove Jan 16, 2026
c2c8ba0
feat: add experimental native columnar to row conversion
andygrove Jan 19, 2026
49a5b20
cargo fmt
andygrove Jan 19, 2026
e558073
cargo clippy
andygrove Jan 19, 2026
a44066f
docs
andygrove Jan 19, 2026
fd58cba
update benchmark [skip ci]
andygrove Jan 19, 2026
bac9164
fix: use correct element sizes in native columnar to row for array/map
andygrove Jan 19, 2026
3ca5553
test: add fuzz test with nested types to native C2R suite
andygrove Jan 19, 2026
7f2e64d
test: add deeply nested type tests to native C2R suite
andygrove Jan 19, 2026
7afc4ba
test: add fuzz test with generateNestedSchema for native C2R
andygrove Jan 20, 2026
adc13a6
format
andygrove Jan 20, 2026
56df742
fix: handle LargeList and improve error handling in native C2R
andygrove Jan 20, 2026
461c625
fix
andygrove Jan 20, 2026
8b8741c
fix: add Dictionary-encoded array support to native C2R
andygrove Jan 20, 2026
b8ed2e7
format
andygrove Jan 20, 2026
330dbb2
clippy [skip ci]
andygrove Jan 20, 2026
8231a75
test: add benchmark comparing JVM and native columnar to row conversion
andygrove Jan 20, 2026
f2cc61c
perf: optimize native C2R by eliminating Vec allocations for strings
andygrove Jan 20, 2026
3ebcaca
perf: add fixed-width fast path for native C2R
andygrove Jan 20, 2026
ed72c29
test: add fixed-width-only benchmark and refactor C2R benchmark
andygrove Jan 20, 2026
17d83d5
perf: optimize complex types in native C2R by eliminating intermediat…
andygrove Jan 20, 2026
5f26a81
perf: add bulk copy optimization for primitive arrays in native C2R
andygrove Jan 20, 2026
e5b2c61
perf: add pre-downcast optimization for native C2R general path
andygrove Jan 20, 2026
7743138
fix: correct array element bulk copy for Date32, Timestamp, Boolean
andygrove Jan 20, 2026
9c66ef6
perf: Velox-style optimization for array/map C2R (40-52% faster)
andygrove Jan 20, 2026
64c5212
perf: inline type dispatch for struct fields in native C2R
andygrove Jan 20, 2026
04c49fb
perf: pre-downcast struct fields for native C2R
andygrove Jan 20, 2026
47d4c50
perf: optimize general path for mixed fixed/variable-length columns
andygrove Jan 20, 2026
081b3ed
revert
andygrove Jan 20, 2026
f696595
upmerge
andygrove Jan 20, 2026
92e1abb
revert doc format change
andygrove Jan 20, 2026
e735434
fix: address clippy warnings and remove dead code in native C2R
andygrove Jan 20, 2026
ab074bd
Remove #[inline] hint from bulk_copy_range
andygrove Jan 20, 2026
a4d5eeb
enable native c2r by default
andygrove Jan 20, 2026
01b5dd0
fix
andygrove Jan 20, 2026
691fb4c
fix
andygrove Jan 21, 2026
5687f79
fix
andygrove Jan 21, 2026
1dc720b
Fix dictionary type mismatch in columnar_to_row conversion
andygrove Jan 21, 2026
5c7da07
Merge remote-tracking branch 'origin/dev/regenerate-golden-files-scri…
andygrove Jan 21, 2026
2581ea5
Add doExecuteBroadcast support to CometNativeColumnarToRowExec
andygrove Jan 21, 2026
90cc9ba
update golden files
andygrove Jan 21, 2026
d46b56a
Add NullVector/NullArray support for native columnar-to-row conversion
andygrove Jan 22, 2026
537d62e
Fix clippy warnings for Rust 1.93
andygrove Jan 22, 2026
90d06d5
Merge branch 'fix-clippy-rust-1.93' into native-c2r-enabled
andygrove Jan 22, 2026
46733bf
Fix dictionary-encoded decimal handling in native columnar-to-row con…
andygrove Jan 22, 2026
26cfa2b
Handle NullArray in native columnar-to-row conversion
andygrove Jan 22, 2026
67383ad
Add FixedSizeBinary support for native columnar-to-row conversion
andygrove Jan 23, 2026
9e60ca5
Fix dictionary-encoded decimal cast to use schema type
andygrove Jan 23, 2026
61b03b7
Disable native C2R when query contains native_comet scan
andygrove Jan 24, 2026
4c29171
format
andygrove Jan 24, 2026
4e53cc8
upmerge
andygrove Jan 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
1 change: 1 addition & 0 deletions .github/workflows/pr_build_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ jobs:
value: |
org.apache.comet.exec.CometShuffleSuite
org.apache.comet.exec.CometShuffle4_0Suite
org.apache.comet.exec.CometNativeColumnarToRowSuite
org.apache.comet.exec.CometNativeShuffleSuite
org.apache.comet.exec.CometShuffleEncryptionSuite
org.apache.comet.exec.CometShuffleManagerSuite
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pr_build_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ jobs:
value: |
org.apache.comet.exec.CometShuffleSuite
org.apache.comet.exec.CometShuffle4_0Suite
org.apache.comet.exec.CometNativeColumnarToRowSuite
org.apache.comet.exec.CometNativeShuffleSuite
org.apache.comet.exec.CometShuffleEncryptionSuite
org.apache.comet.exec.CometShuffleManagerSuite
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
CLAUDE.md
target
.idea
*.iml
Expand Down
11 changes: 11 additions & 0 deletions common/src/main/scala/org/apache/comet/CometConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,17 @@ object CometConf extends ShimCometConf {
val COMET_EXEC_LOCAL_TABLE_SCAN_ENABLED: ConfigEntry[Boolean] =
createExecEnabledConfig("localTableScan", defaultValue = false)

val COMET_NATIVE_COLUMNAR_TO_ROW_ENABLED: ConfigEntry[Boolean] =
conf(s"$COMET_EXEC_CONFIG_PREFIX.columnarToRow.native.enabled")
.category(CATEGORY_EXEC)
.doc(
"Whether to enable native columnar to row conversion. When enabled, Comet will use " +
"native Rust code to convert Arrow columnar data to Spark UnsafeRow format instead " +
"of the JVM implementation. This can improve performance for queries that need to " +
"convert between columnar and row formats.")
.booleanConf
.createWithDefault(true)

val COMET_EXEC_SORT_MERGE_JOIN_WITH_JOIN_FILTER_ENABLED: ConfigEntry[Boolean] =
conf("spark.comet.exec.sortMergeJoinWithJoinFilter.enabled")
.category(CATEGORY_ENABLE_EXEC)
Expand Down
20 changes: 20 additions & 0 deletions common/src/main/scala/org/apache/comet/vector/NativeUtil.scala
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,26 @@ class NativeUtil {
(arrays, schemas)
}

/**
* Exports a ColumnarBatch to Arrow FFI and returns the memory addresses.
*
* This is a convenience method that allocates Arrow structs, exports the batch, and returns
* just the memory addresses (without exposing the Arrow types).
*
* @param batch
* the columnar batch to export
* @return
* a tuple of (array addresses, schema addresses, number of rows)
*/
def exportBatchToAddresses(batch: ColumnarBatch): (Array[Long], Array[Long], Int) = {
val numCols = batch.numCols()
val (arrays, schemas) = allocateArrowStructs(numCols)
val arrayAddrs = arrays.map(_.memoryAddress())
val schemaAddrs = schemas.map(_.memoryAddress())
val numRows = exportBatch(arrayAddrs, schemaAddrs, batch)
(arrayAddrs, schemaAddrs, numRows)
}

/**
* Exports a Comet `ColumnarBatch` into a list of memory addresses that can be consumed by the
* native execution.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import java.nio.channels.Channels
import scala.jdk.CollectionConverters._

import org.apache.arrow.c.CDataDictionaryProvider
import org.apache.arrow.vector.{BigIntVector, BitVector, DateDayVector, DecimalVector, FieldVector, FixedSizeBinaryVector, Float4Vector, Float8Vector, IntVector, SmallIntVector, TimeStampMicroTZVector, TimeStampMicroVector, TinyIntVector, ValueVector, VarBinaryVector, VarCharVector, VectorSchemaRoot}
import org.apache.arrow.vector.{BigIntVector, BitVector, DateDayVector, DecimalVector, FieldVector, FixedSizeBinaryVector, Float4Vector, Float8Vector, IntVector, NullVector, SmallIntVector, TimeStampMicroTZVector, TimeStampMicroVector, TinyIntVector, ValueVector, VarBinaryVector, VarCharVector, VectorSchemaRoot}
import org.apache.arrow.vector.complex.{ListVector, MapVector, StructVector}
import org.apache.arrow.vector.dictionary.DictionaryProvider
import org.apache.arrow.vector.ipc.ArrowStreamWriter
Expand Down Expand Up @@ -282,7 +282,7 @@ object Utils extends CometTypeShim {
_: BigIntVector | _: Float4Vector | _: Float8Vector | _: VarCharVector |
_: DecimalVector | _: DateDayVector | _: TimeStampMicroTZVector | _: VarBinaryVector |
_: FixedSizeBinaryVector | _: TimeStampMicroVector | _: StructVector | _: ListVector |
_: MapVector) =>
_: MapVector | _: NullVector) =>
v.asInstanceOf[FieldVector]
case _ =>
throw new SparkException(s"Unsupported Arrow Vector for $reason: ${valueVector.getClass}")
Expand Down
Loading
Loading