Skip to content

Handle DB duplicate key without recursion#527

Open
talyguryn wants to merge 2 commits intomasterfrom
fix/E11000-duplicate-key-error
Open

Handle DB duplicate key without recursion#527
talyguryn wants to merge 2 commits intomasterfrom
fix/E11000-duplicate-key-error

Conversation

@talyguryn
Copy link
Member

Replace the previous recursive reprocessing on database duplicate-key errors with a safer flow: log the duplicate, invalidate the event cache, wait briefly, fetch the event inserted by the competing worker and treat it as a repetition. If the event cannot be read back, raise a DatabaseReadWriteError. This avoids recursive calls to handle(), ensures fresh DB reads (cache coherence), and adds logging for diagnostics; repetition processing is resumed using the fetched event.

MongoError: E11000 duplicate key error collection

Replace the previous recursive reprocessing on database duplicate-key errors with a safer flow: log the duplicate, invalidate the event cache, wait briefly, fetch the event inserted by the competing worker and treat it as a repetition. If the event cannot be read back, raise a DatabaseReadWriteError. This avoids recursive calls to handle(), ensures fresh DB reads (cache coherence), and adds logging for diagnostics; repetition processing is resumed using the fetched event.
Copilot AI review requested due to automatic review settings February 14, 2026 13:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates GrouperWorker.handle() to avoid recursive re-processing on MongoDB duplicate-key inserts by invalidating the event cache, reading back the event created by a competing worker, and proceeding as repetition handling.

Changes:

  • Replaces recursive handle() retry on duplicate-key with cache invalidation + read-back flow.
  • Adds diagnostic logging around duplicate-key detection and read-back.
  • Moves repetition-processing logic into a separate post-insert conditional block.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +241 to +243
this.cache.del(eventCacheKey);

/**
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await new Promise(resolve => setTimeout(resolve, 10)) introduces a magic number in production code. This repo commonly either defines numeric constants or explicitly disables @typescript-eslint/no-magic-numbers for such cases (e.g. lib/memoize/index.ts:43 and workers/release/src/index.ts:228-229). Please extract the delay into a named constant (or add a targeted eslint-disable) so linting and future tuning are straightforward.

Copilot uses AI. Check for mistakes.
Comment on lines 228 to +245
} catch (e) {
/**
* If we caught Database duplication error, then another worker thread has already saved it to the database
* and we need to process this event as repetition
* Clear the cache and fetch the event that was just inserted, then process it as a repetition
*/
if (e.code?.toString() === DB_DUPLICATE_KEY_ERROR) {
await this.handle(task);
this.logger.info(`[handle] Duplicate key detected for groupHash=${uniqueEventHash}, fetching created event as repetition`);

const eventCacheKey = await this.getEventCacheKey(task.projectId, uniqueEventHash);

return;
/**
* Invalidate cache to force fresh fetch from database
*/
this.cache.del(eventCacheKey);

/**
* Fetch the event that was just inserted by the competing worker
* Add small delay to ensure the event is persisted
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new duplicate-key handling path (invalidate cache → delay → read-back → continue as repetition) doesn’t appear to have explicit test coverage. There are extensive handle() tests in workers/grouper/tests/index.test.ts, but none that asserts the duplicate-key branch increments totalCount / creates a repetition / marks notifier payload as isNew: false when two workers race on the first insert. Please add or extend a test to cover this branch deterministically (e.g., by forcing saveEvent to throw code 11000 once, or by asserting outcomes in the existing simultaneous-handles test).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +259 to +261
* Now continue processing as if this was not the first occurrence
* This avoids recursion and properly handles the event as a repetition
*/
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing test coverage for simultaneous events (line 292-301 in tests/index.test.ts) may not adequately verify that duplicate key scenarios are processed as repetitions. Specifically, the test should verify that:

  1. totalCount is incremented for each duplicate event
  2. Repetition records are created in the database
  3. Daily event counts are correctly updated

Without these assertions, the critical bug at line 271 (incorrect condition) would not be caught by tests.

Suggested change
* Now continue processing as if this was not the first occurrence
* This avoids recursion and properly handles the event as a repetition
*/
* Mark this occurrence as a repetition to ensure repetition-processing logic runs
* after handling the duplicate key error.
*/
isFirstOccurrence = false;

Copilot uses AI. Check for mistakes.
Comment on lines +244 to +250
* Fetch the event that was just inserted by the competing worker
* Add small delay to ensure the event is persisted
*/
await new Promise(resolve => setTimeout(resolve, DUPLICATE_KEY_RETRY_DELAY_MS));

existedEvent = await this.getEvent(task.projectId, uniqueEventHash);

Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 10ms delay may not be sufficient for all scenarios, particularly under high load or with slow disk I/O. MongoDB's write concern and replication lag could mean the event isn't immediately visible to other queries. Consider:

  1. Using a configurable delay value
  2. Implementing retry logic with exponential backoff
  3. Or better yet, using MongoDB's read concern "majority" to ensure we read the committed write

The current implementation may still result in the error "Event not found after duplicate key error" in edge cases.

Suggested change
* Fetch the event that was just inserted by the competing worker
* Add small delay to ensure the event is persisted
*/
await new Promise(resolve => setTimeout(resolve, DUPLICATE_KEY_RETRY_DELAY_MS));
existedEvent = await this.getEvent(task.projectId, uniqueEventHash);
* Fetch the event that was just inserted by the competing worker.
* Use bounded retry with exponential backoff to tolerate replication / visibility lag.
*/
const maxDuplicateKeyRetries = 5;
let duplicateKeyRetryDelayMs = DUPLICATE_KEY_RETRY_DELAY_MS;
for (let attempt = 0; attempt < maxDuplicateKeyRetries && !existedEvent; attempt++) {
if (attempt > 0) {
await new Promise(resolve => setTimeout(resolve, duplicateKeyRetryDelayMs));
duplicateKeyRetryDelayMs *= 2;
}
existedEvent = await this.getEvent(task.projectId, uniqueEventHash);
}

Copilot uses AI. Check for mistakes.
Copy link
Member

@neSpecc neSpecc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about using Redis for storing group hash that already has been accepted.

All Grouper instances can access this key to check for isFirstOccurrence

The problem is how to support old events that was accepted before Redis solution added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments