-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Oracle Sink Plugin: Lack of MERGE/UPSERT Leads to Duplicate Issues on Retries
Plugin: Oracle Batch Sink (oracle-plugin)
Component: Sink
Describe the bug
The current Oracle Sink plugin appears to use standard JDBC batch INSERT statements. When used in a Spark environment like Cloud Data Fusion, if a Spark task fails mid-batch (e.g., due to network issues, or database errors like ORA-08177 transaction serialization failures), Spark will retry the task.
Because the sink operation isn't idempotent, this retry mechanism can lead to ORA-00001: unique constraint violated errors if the target table has unique constraints. If no such constraints exist, it can lead to silent duplicate record insertion.
This issue has been observed by customers, causing data integrity problems and pipeline failures.
To Reproduce
- Configure a pipeline with an Oracle Sink.
- Ensure the target Oracle table has a unique constraint.
- Simulate a task failure during the sink write operation (e.g., by temporarily making the database unavailable or causing a serialization error).
- Observe that on retry,
ORA-00001errors are likely to occur.
Expected behavior
Ideally, the sink plugin should handle retries more gracefully. An idempotent sink would prevent duplicate inserts. Using Oracle's MERGE statement (or similar upsert logic) would allow the sink to either insert new rows or update existing ones if they are encountered again during a retry, based on the table's primary or unique keys.
Workarounds
- Using
TRANSACTION_READ_COMMITTEDcan sometimes mitigate theORA-08177errors that trigger retries, but doesn't solve the idempotency issue itself. - Ensuring the database environment is stable to prevent the initial task failures.
Feature Request
Enhance the Oracle Sink plugin to support an optional MERGE or UPSERT mode. This would significantly improve resilience and data integrity in environments where task retries are possible.
Affected Versions (if known): Observed with oracle-plugin version 1.10.7, but likely affects other versions too.