⚡ Bolt: Optimize issue retrieval and implement blockchain integrity#348
Conversation
- Optimized `get_user_issues`, `get_nearby_issues`, and spatial deduplication queries using SQLAlchemy column projection. - Reduced database I/O and memory usage by avoiding full ORM object loading for list views. - Implemented a lightweight SHA-256 integrity seal (blockchain) for issues to ensure data tamper-resistance. - Improved atomic upvote increments using bulk `update()` queries. - Updated database migration logic to handle the new `integrity_hash` column. Co-authored-by: RohanExploit <178623867+RohanExploit@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
✅ Deploy Preview for fixmybharat canceled.
|
🙏 Thank you for your contribution, @RohanExploit!PR Details:
Quality Checklist:
Review Process:
Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken. |
📝 WalkthroughWalkthroughThis pull request adds blockchain-based integrity verification to issues via a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/routers/issues.py`:
- Around line 168-177: The current create_issue flow (where prev_issue is read
via run_in_threadpool and integrity_hash computed) is vulnerable to race
conditions and omits important fields; fix by serializing the create path (e.g.,
acquire a DB-level advisory lock or run create_issue inside a
SERIALIZABLE/REPEATABLE READ transaction) so reading prev_issue and inserting
the new Issue happen atomically, and change the integrity_hash computation to
include all material fields (e.g., description, category, latitude, longitude,
user_email, location, created_at and prev_hash) so tampering is detectable;
ensure the lock/transaction surrounds the query of Issue.integrity_hash, hash
computation, and the subsequent insert to guarantee a linear chain.
🧹 Nitpick comments (2)
backend/init_db.py (1)
100-105: Inconsistent logging:print()vslogger.info().Other migration steps in this file (e.g., lines 19, 27, 35) use
logger.info()for success messages, while this block (and the latitude/longitude/location/action_plan blocks) useprint(). Consider usinglogger.info()here for consistency with the structured logging approach used elsewhere.Also, per the Ruff S110 hint, the bare
except Exception: passsilently swallows errors. The existing pattern throughout this file has the same issue, but for new code it's worth logging atdebuglevel to aid troubleshooting unexpected migration failures (e.g., permission errors that aren't "column already exists").Suggested fix
# Add integrity_hash column for blockchain feature try: conn.execute(text("ALTER TABLE issues ADD COLUMN integrity_hash VARCHAR")) - print("Migrated database: Added integrity_hash column.") - except Exception: - pass + logger.info("Migrated database: Added integrity_hash column.") + except Exception: + logger.debug("integrity_hash column likely already exists.")backend/routers/issues.py (1)
546-565: Inconsistentcreated_atserialization across endpoints.In
get_user_issues,created_atis passed as a rawdatetimeobject (Line 556), relying on Pydantic to serialize it. Inget_recent_issues(Line 604),.isoformat()is called explicitly because the data is also cached as JSON. This inconsistency could lead to different date formats in API responses if the serialization paths diverge (e.g., a Pydantic config change).Consider aligning the approach — either always let Pydantic handle it (preferred) or always pre-format.
| # Blockchain feature: calculate integrity hash for the report | ||
| # Optimization: Fetch only the last hash to maintain the chain with minimal overhead | ||
| prev_issue = await run_in_threadpool( | ||
| lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first() | ||
| ) | ||
| prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else "" | ||
|
|
||
| # Simple but effective SHA-256 chaining | ||
| hash_content = f"{description}|{category}|{prev_hash}" | ||
| integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest() |
There was a problem hiding this comment.
Race condition: concurrent creates break the hash chain.
The "blockchain" integrity chain fetches the previous hash (Line 171) and computes the new hash (Line 177) without any serialization. Two concurrent create_issue requests will both read the same prev_hash, producing two issues that chain off the same predecessor — breaking the chain's linear integrity guarantee. This is the core value proposition of chaining, so it's a significant gap.
Additionally, the hash only covers description|category|prev_hash. Fields like latitude, longitude, user_email, location, and created_at are excluded, meaning those fields could be tampered with without breaking the chain. If the goal is tamper-evident audit, consider including all material fields.
Possible mitigations:
- Use a database-level advisory lock or serializable isolation for the create path to enforce sequential chaining.
- Include all material issue fields in the hash input.
🤖 Prompt for AI Agents
In `@backend/routers/issues.py` around lines 168 - 177, The current create_issue
flow (where prev_issue is read via run_in_threadpool and integrity_hash
computed) is vulnerable to race conditions and omits important fields; fix by
serializing the create path (e.g., acquire a DB-level advisory lock or run
create_issue inside a SERIALIZABLE/REPEATABLE READ transaction) so reading
prev_issue and inserting the new Issue happen atomically, and change the
integrity_hash computation to include all material fields (e.g., description,
category, latitude, longitude, user_email, location, created_at and prev_hash)
so tampering is detectable; ensure the lock/transaction surrounds the query of
Issue.integrity_hash, hash computation, and the subsequent insert to guarantee a
linear chain.
There was a problem hiding this comment.
Pull request overview
This PR introduces performance optimizations and a blockchain integrity feature to the VishwaGuru backend. The optimization work focuses on replacing full ORM model loading with targeted column projection in issue retrieval endpoints, which reduces memory footprint and database transfer costs. The blockchain feature adds cryptographic hash chaining to create an audit trail for issue creation.
Changes:
- Column projection optimization for
get_nearby_issues,get_user_issues, and spatial deduplication queries to reduce memory usage - Atomic
update()query pattern for upvote deduplication to prevent race conditions - Blockchain integrity hash implementation using SHA-256 chaining with previous issue hash
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| backend/routers/issues.py | Implements column projection for issue queries, atomic upvote updates, and blockchain integrity hash calculation during issue creation |
| backend/models.py | Adds integrity_hash column to Issue model for blockchain feature |
| backend/init_db.py | Database migration to add integrity_hash column |
| .jules/bolt.md | Documents the column projection optimization pattern and best practices |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Blockchain feature: calculate integrity hash for the report | ||
| # Optimization: Fetch only the last hash to maintain the chain with minimal overhead | ||
| prev_issue = await run_in_threadpool( | ||
| lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first() | ||
| ) | ||
| prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else "" | ||
|
|
||
| # Simple but effective SHA-256 chaining | ||
| hash_content = f"{description}|{category}|{prev_hash}" | ||
| integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest() |
There was a problem hiding this comment.
Missing test coverage for the newly introduced blockchain integrity feature. The integrity_hash calculation and chaining logic is not covered by any tests. Given that this is a critical security feature for audit trails, it should have comprehensive test coverage including: sequential issue creation to verify proper chaining, concurrent creation to test race conditions, and hash verification to ensure data integrity.
Add tests to verify the blockchain functionality works correctly under both normal and concurrent conditions.
| # Atomic update for upvotes to prevent race conditions | ||
| # Use query update to avoid fetching the full model instance | ||
| await run_in_threadpool( | ||
| lambda: db.query(Issue).filter(Issue.id == linked_issue_id).update({ | ||
| Issue.upvotes: func.coalesce(Issue.upvotes, 0) + 1 | ||
| }, synchronize_session=False) | ||
| ) | ||
|
|
||
| # Update the database with the upvote | ||
| # Commit the upvote | ||
| await run_in_threadpool(db.commit) |
There was a problem hiding this comment.
The atomic upvote update using query().update() is correct, but the commit is done in a separate threadpool call without error handling for the update operation itself. If the update() call succeeds but the commit fails, the update is rolled back but no error is raised to the caller. The deduplication flow continues as if the upvote succeeded.
Consider wrapping both the update and commit in the same lambda to ensure transactional consistency, or add error handling after the update to verify it affected exactly one row before committing.
| desc = row.description or "" | ||
| short_desc = desc[:100] + "..." if len(desc) > 100 else desc |
There was a problem hiding this comment.
Description truncation logic doesn't handle None values safely. If row.description is None, the code tries to slice None which would raise a TypeError. While line 549 guards against this with desc = row.description or "", the same pattern should be consistently applied.
The current code at line 549 is correct, but ensure this pattern is consistently used elsewhere. Consider extracting this into a helper function to avoid duplication and potential bugs.
| # Blockchain feature: calculate integrity hash for the report | ||
| # Optimization: Fetch only the last hash to maintain the chain with minimal overhead | ||
| prev_issue = await run_in_threadpool( | ||
| lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first() |
There was a problem hiding this comment.
The blockchain implementation assumes Issue.id is a reliable ordering mechanism, but this may not be safe in all database configurations. Some databases don't guarantee that auto-increment IDs are strictly sequential across concurrent transactions (e.g., if a transaction with ID 100 commits before a transaction with ID 99).
Consider adding a dedicated created_at or sequence column with a database-level constraint to ensure proper ordering, or add a chain_index column that's explicitly managed. Alternatively, document that this blockchain feature requires single-writer guarantees or accept that the chain ordering might not perfectly match chronological order under high concurrency.
| lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first() | |
| lambda: db.query(Issue.integrity_hash) | |
| .order_by(Issue.created_at.desc(), Issue.id.desc()) | |
| .first() |
| # Blockchain feature: calculate integrity hash for the report | ||
| # Optimization: Fetch only the last hash to maintain the chain with minimal overhead | ||
| prev_issue = await run_in_threadpool( | ||
| lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first() | ||
| ) | ||
| prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else "" | ||
|
|
||
| # Simple but effective SHA-256 chaining | ||
| hash_content = f"{description}|{category}|{prev_hash}" | ||
| integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest() |
There was a problem hiding this comment.
Critical race condition in blockchain integrity implementation. When multiple issues are created concurrently, they can read the same previous hash before any of them commits, resulting in multiple issues with the same previous hash and breaking the blockchain chain integrity. This defeats the purpose of the blockchain audit trail.
Consider using database-level locking (SELECT FOR UPDATE) or a transaction-level sequence to ensure each issue gets a unique position in the chain. Alternatively, calculate the hash after commit and update it, or use a dedicated sequence/counter column to enforce ordering independently of the hash chain.
| # Convert results to dictionaries for faster serialization and schema compliance | ||
| data = [] | ||
| for row in results: | ||
| desc = row.description or "" | ||
| short_desc = desc[:100] + "..." if len(desc) > 100 else desc | ||
|
|
||
| data.append({ | ||
| "id": row.id, | ||
| "category": row.category, | ||
| "description": short_desc, | ||
| "created_at": row.created_at, | ||
| "image_path": row.image_path, | ||
| "status": row.status, | ||
| "upvotes": row.upvotes if row.upvotes is not None else 0, | ||
| "location": row.location, | ||
| "latitude": row.latitude, | ||
| "longitude": row.longitude | ||
| }) | ||
|
|
||
| return data |
There was a problem hiding this comment.
API response format inconsistency with declared response_model. This endpoint declares response_model=List[IssueSummaryResponse] but returns a list of plain dictionaries instead of Pydantic model instances. This bypasses FastAPI's automatic validation and serialization, which could lead to runtime errors if the response structure doesn't match the schema.
The correct approach is to either: (1) Return IssueSummaryResponse instances constructed from the rows, or (2) Continue returning dictionaries but ensure they're properly validated. Note that FastAPI with response_model will automatically validate and serialize dictionaries if they match the schema structure, but it's more explicit and maintainable to use Pydantic instances.
| prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else "" | ||
|
|
||
| # Simple but effective SHA-256 chaining | ||
| hash_content = f"{description}|{category}|{prev_hash}" |
There was a problem hiding this comment.
Incomplete blockchain integrity: The integrity hash only includes description and category, but excludes other important fields like location (latitude/longitude), user_email, and timestamps. This means two issues with the same description and category but different locations would produce identical hash chain entries, making it impossible to verify the full integrity of the data.
Consider including all immutable fields in the hash calculation: description, category, latitude, longitude, location, user_email, and optionally a timestamp. This provides stronger integrity guarantees and makes the blockchain audit trail more meaningful.
| hash_content = f"{description}|{category}|{prev_hash}" | |
| # Include all relevant immutable fields to strengthen integrity guarantees | |
| hash_parts = [ | |
| description or "", | |
| category or "", | |
| user_email or "", | |
| "" if latitude is None else str(latitude), | |
| "" if longitude is None else str(longitude), | |
| location or "", | |
| prev_hash, | |
| ] | |
| hash_content = "|".join(hash_parts) |
| try: | ||
| conn.execute(text("ALTER TABLE issues ADD COLUMN integrity_hash VARCHAR")) | ||
| print("Migrated database: Added integrity_hash column.") | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
This PR introduces significant performance improvements to the VishwaGuru backend by optimizing how issues are retrieved and processed. By switching from full ORM model loading to targeted column projection, we reduce the memory footprint and database transfer size for common endpoints. Additionally, as requested, a "blockchain" feature has been implemented in the form of a cryptographic integrity seal that chains reports together using SHA-256 hashes, providing a robust audit trail with minimal performance overhead.
Key Changes:
backend/routers/issues.pyfor retrieval endpoints.update()for deduplication upvotes.integrity_hashcolumn added toIssuemodel.create_issue.PR created automatically by Jules for task 1135829609348711867 started by @RohanExploit
Summary by CodeRabbit
New Features
Bug Fixes