Skip to content

Feature/lockfile summary#8

Closed
asonas wants to merge 7 commits intotakai:mainfrom
asonas:feature/lockfile-summary
Closed

Feature/lockfile summary#8
asonas wants to merge 7 commits intotakai:mainfrom
asonas:feature/lockfile-summary

Conversation

@asonas
Copy link

@asonas asonas commented Jan 28, 2026

Summary

  • Add automatic detection and summarization of lock files (uv.lock, package-lock.json, yarn.lock, Cargo.lock, etc.)
  • Large lock file diffs (>150KB) are replaced with a summary like [Lock file: +500 -200 lines, 153600 bytes, content omitted]
  • Regular files retain their full diff output

This prevents token limit issues when staging large auto-generated files.

Motivation

When staging large lock files like uv.lock with thousands of lines, the LLM agent can fail due to token limits.
Lock file contents are auto-generated and not useful for commit message generation, so summarizing them reduces token usage without losing meaningful context.

Threshold

Lock files are only summarized when their diff exceeds 150KB. Smaller changes (e.g., adding a single package) retain full diff so the LLM can generate
accurate commit messages.

150KB was chosen based on:

  • Claude Haiku's 200k token context limit (smallest among supported LLMs)
  • Empirical testing showed ~300KB as the failure boundary
  • 50% safety margin applied

Lock file detection

Detected by suffix pattern:

  • .lock (uv.lock, poetry.lock, yarn.lock, Gemfile.lock, Cargo.lock, composer.lock, etc.)
  • -lock.json (package-lock.json)
  • -lock.yaml (pnpm-lock.yaml)

Special case:

  • go.sum (explicit match, no suffix pattern)

Add ParseNumstat to parse git numstat output and StagedDiffWithSummary to generate
diffs that include full content for regular files but only line count summaries
for lock files. This reduces noise in commit diffs when lock files have many changes.
Add tests for when only lock files are staged and when no lock
files are present in the diff. These tests verify that lock file
summaries are generated correctly and that regular files show
their full content without lock file markers.
…t entries

Removed lock files that match common suffixes (.lock, .yaml, -lock.json)
to eliminate duplication. The suffix-based detection already covers these
files, so maintaining them in the explicit map is unnecessary.
Add a configurable threshold (LockFileSummaryThreshold = 200 lines) to only
summarize lock files with significant changes. Lock files below the threshold
are now shown in full diff, improving visibility for small updates while
keeping large lock file diffs manageable.
… collection

Changes the lock file summary threshold from a line-count metric to a
byte-size metric (150KB) that better aligns with LLM context limits.
Optimizes the diff collection logic to fetch diffs once and reuse them
when determining whether a file exceeds the threshold, eliminating
redundant git diff calls and improving performance.
@asonas
Copy link
Author

asonas commented Jan 28, 2026

Closing this PR as I just noticed #7 already addresses the same issue. I wasn't aware of the existing work. Thank you!

@asonas asonas closed this Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant