-
Notifications
You must be signed in to change notification settings - Fork 272
Open
Milestone
Description
Description
When a native Comet operator receives data from a Spark scan that produces OnHeapColumnVector instead of Arrow arrays, Comet fails with:
org.apache.spark.SparkException: Comet execution only takes Arrow Arrays, but got class org.apache.spark.sql.execution.vectorized.OnHeapColumnVector
This can happen when:
- The native scan (e.g.,
native_comet) doesn't support certain data types (like complex types) - The scan falls back to Spark's Parquet reader
- A downstream native operator (like the native Parquet writer) receives the non-Arrow data
Reproduction
// With native Parquet write enabled but without COMET_SCAN_ALLOW_INCOMPATIBLE
withSQLConf(
"spark.comet.parquet.write.enabled" -> "true",
"spark.comet.exec.enabled" -> "true") {
// Create data with complex types
val df = Seq((1, Seq(1, 2, 3))).toDF("id", "values")
// Write to parquet (without Comet)
df.write.parquet("/tmp/input")
// Read and write - this fails because native_comet scan doesn't support
// complex types, falls back to Spark reader, but downstream native writer
// expects Arrow arrays
spark.read.parquet("/tmp/input").write.parquet("/tmp/output")
}Expected Behavior
Comet should either:
- Fall back the entire query to Spark when native operators would receive non-Arrow data
- Automatically insert conversion from
OnHeapColumnVectorto Arrow (using the existingspark.comet.convert.parquet.enabledmechanism) - Provide a clearer error message explaining why this happened and how to fix it (e.g., "Enable spark.comet.scan.allowIncompatible to use native_iceberg_compat scan which supports complex types")
Current Workarounds
- Enable
spark.comet.scan.allowIncompatible=trueso thatnative_iceberg_compatscan is used (which supports complex types) - Enable
spark.comet.convert.parquet.enabled=trueto convert Spark columnar data to Arrow
Context
This was discovered while adding complex type support to the native Parquet writer (#3214). The fix there uses COMET_SCAN_ALLOW_INCOMPATIBLE, but the underlying issue of ungraceful failure should be addressed.
coderfender
Metadata
Metadata
Assignees
Labels
No labels