Evaluate replacing Protocol Buffers with FlatBuffers for plan serialization

## Summary

Investigate the feasibility and potential performance benefits of replacing Protocol Buffers (protobuf) with FlatBuffers for serializing query plans in Scala and deserializing them in Rust.

## Motivation

FlatBuffers offers zero-copy deserialization which could reduce overhead in the JVM-to-native communication path. Plan deserialization occurs per partition in `CometExecIterator`, so any reduction in deserialization cost could improve overall query performance.

| Aspect | Protobuf (Current) | FlatBuffers |
|--------|-------------------|-------------|
| Deserialization | O(n) - parses all fields | O(1) - zero-copy access |
| Memory allocation | Creates new objects | Reads directly from buffer |
| Random field access | Must parse all preceding | Direct offset lookup |

## Current State

- **Proto schemas**: `native/proto/src/proto/*.proto` (~1,038 lines across 7 files)
- **Message types**: 94 message types with 8 oneof unions
- **Scala serialization**: 33 handlers in `spark/src/main/scala/org/apache/comet/serde/`
- **Rust deserialization**: `native/core/src/execution/serde.rs`, `native/core/src/execution/planner.rs`

## Migration Scope

| Component | Files Affected | Effort |
|-----------|---------------|--------|
| Schema rewrite | 7 .proto → 7 .fbs | Medium |
| Scala serializers | 33+ files | High |
| Rust deserializers | ~5 files | Medium |
| Build system | 3-4 files | Low |
| Tests | Many | High |

## Challenges

1. **Schema translation**: 94 message types need FlatBuffer equivalents; oneof unions have different semantics in FlatBuffers
2. **Scala code changes**: FlatBuffers requires bottom-up construction (nested objects must be built first), different from protobuf's builder pattern
3. **Rust code changes**: Replace `prost` with `flatbuffers` crate; update `planner.rs` (~2,700 lines) to work with FlatBuffer accessors
4. **Recursive structures**: `Operator` contains `children: repeated Operator` which requires careful handling

## Open Questions

1. What is the actual serialization/deserialization overhead with the current protobuf implementation? (Need benchmarking)
2. For typical query plans (KB-sized), would zero-copy provide meaningful benefits given the plan is immediately converted to DataFusion structures?
3. Would the increased code complexity and maintenance burden be justified by the performance gains?

## Suggested Approach

1. **Benchmark current state**: Profile protobuf serialization/deserialization overhead to quantify the problem
2. **Proof of concept**: Convert a single operator type (e.g., `Filter`) to FlatBuffers and benchmark
3. **Evaluate alternatives**: Consider optimizations within protobuf first (arena allocation in Rust, builder reuse in Scala)
4. **Full migration** (if justified): Incrementally migrate all operators with comprehensive testing

## Related Links

- [FlatBuffers documentation](https://google.github.io/flatbuffers/)
- [flatbuffers Rust crate](https://crates.io/crates/flatbuffers)
- [flatbuffers-java](https://github.com/google/flatbuffers/tree/master/java)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate replacing Protocol Buffers with FlatBuffers for plan serialization #3245

Summary

Motivation

Current State

Migration Scope

Challenges

Open Questions

Suggested Approach

Related Links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Aspect	Protobuf (Current)	FlatBuffers
Deserialization	O(n) - parses all fields	O(1) - zero-copy access
Memory allocation	Creates new objects	Reads directly from buffer
Random field access	Must parse all preceding	Direct offset lookup

Component	Files Affected	Effort
Schema rewrite	7 .proto → 7 .fbs	Medium
Scala serializers	33+ files	High
Rust deserializers	~5 files	Medium
Build system	3-4 files	Low
Tests	Many	High

Evaluate replacing Protocol Buffers with FlatBuffers for plan serialization #3245

Description

Summary

Motivation

Current State

Migration Scope

Challenges

Open Questions

Suggested Approach

Related Links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions