Skip to content

Daily Civic Intelligence Refinement Engine#413

Open
RohanExploit wants to merge 4 commits intomainfrom
civic-intelligence-engine-10394596499515001806
Open

Daily Civic Intelligence Refinement Engine#413
RohanExploit wants to merge 4 commits intomainfrom
civic-intelligence-engine-10394596499515001806

Conversation

@RohanExploit
Copy link
Owner

@RohanExploit RohanExploit commented Feb 17, 2026

Implemented a daily scheduled job that analyzes civic issues to refine the system's intelligence.
Key components:

  • AdaptiveWeights: Manages severity keywords and category configurations in backend/data/modelWeights.json. It supports hot-reloading so changes by the scheduler are immediately picked up by the API.
  • TrendAnalyzer: Analyzes issues from the last 24h to find top keywords, category spikes, and geographic clusters (using DBSCAN if available).
  • CivicIntelligenceEngine: The core logic that runs the daily refinement process. It calculates a "Civic Intelligence Index", updates weights based on critical issue patterns, and adjusts duplicate detection sensitivity based on duplicate rates.
  • PriorityEngine Update: Modified to use dynamic weights from AdaptiveWeights instead of hardcoded values.
  • Scheduler: A standalone script backend/scheduler/daily_refinement_job.py to run the daily process.

This system allows VishwaGuru to self-improve by learning from new data and adapting its prioritization and deduplication logic automatically.


PR created automatically by Jules for task 10394596499515001806 started by @RohanExploit


Summary by cubic

Added a daily civic intelligence engine that learns from the last 24h to update weights, tune duplicate detection, and snapshot a Civic Intelligence Index. Fixed Render data paths for persistence and added scikit‑learn/numpy to resolve clustering imports; PriorityEngine now reads hot‑reloaded weights in real time.

  • New Features

    • AdaptiveWeights: JSON-backed severity, urgency, and categories with hot reloading; adjustable duplicate_search_radius.
    • TrendAnalyzer: extracts top keywords, detects category spikes vs previous snapshot, and finds geographic clusters (DBSCAN).
    • CivicIntelligenceEngine + Scheduler: runs daily, adds critical keywords from recent data, adjusts duplicate radius by duplicate rate, and saves snapshots to data/dailySnapshots/.
  • Refactors

    • PriorityEngine now uses AdaptiveWeights instead of hardcoded values.
    • Absolute project-root data paths for Render; .gitignore excludes data/dailySnapshots/ and data/modelWeights.json; Render requirements include scikit-learn and numpy to fix runtime imports.

Written for commit 4604a37. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Dynamic, live-configurable weight management for issue classification.
    • Daily civic intelligence refinement producing a 0–100 index with persisted snapshots and a scheduled job.
    • Trend detection: top keywords, category spikes, and geographic clustering.
    • Automatic tuning of duplicate-detection threshold based on observed data.
  • Chores

    • Added persistent model weights dataset and updated ignore rules.
    • Added numeric/ML libraries to backend requirements.

- Added `AdaptiveWeights` for dynamic severity/category configuration with hot-reloading.
- Added `TrendAnalyzer` to detect keywords, category spikes, and geographic clusters.
- Added `CivicIntelligenceEngine` to orchestrate daily analysis and generate snapshots.
- Added duplicate threshold adjustment logic based on daily duplicate rates.
- Updated `PriorityEngine` to use `AdaptiveWeights`.
- Added scheduler script `backend/scheduler/daily_refinement_job.py`.
- Added `backend/data/dailySnapshots/` directory.
Copilot AI review requested due to automatic review settings February 17, 2026 18:50
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@netlify
Copy link

netlify bot commented Feb 17, 2026

Deploy Preview for fixmybharat canceled.

Name Link
🔨 Latest commit 4604a37
🔍 Latest deploy log https://app.netlify.com/projects/fixmybharat/deploys/6994c0021636aa0008f1f233

@github-actions
Copy link

🙏 Thank you for your contribution, @RohanExploit!

PR Details:

Quality Checklist:
Please ensure your PR meets the following criteria:

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Code is commented where necessary
  • Documentation updated (if applicable)
  • No new warnings generated
  • Tests added/updated (if applicable)
  • All tests passing locally
  • No breaking changes to existing functionality

Review Process:

  1. Automated checks will run on your code
  2. A maintainer will review your changes
  3. Address any requested changes promptly
  4. Once approved, your PR will be merged! 🎉

Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken.

@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

Adds an adaptive weight manager and JSON weight dataset, a CivicIntelligenceEngine that runs a daily refinement pipeline (trend analysis, weight optimization, duplicate threshold tuning, civic index calc, snapshotting), a TrendAnalyzer, scheduler entrypoint, and refactors PriorityEngine to read dynamic weights.

Changes

Cohort / File(s) Summary
Adaptive Weight Management
backend/adaptive_weights.py, backend/data/modelWeights.json
New AdaptiveWeights singleton with in-code defaults, JSON-backed persistence at backend/data/modelWeights.json, live-reload checks, accessors for severity_keywords, urgency_patterns, categories, duplicate_search_radius, and mutation methods to update and persist weights.
Civic Intelligence & Scheduler
backend/civic_intelligence.py, backend/scheduler/daily_refinement_job.py
New CivicIntelligenceEngine singleton with refine_daily(db) that loads recent Issues/Grievances, loads last snapshot, runs trend_analyzer, optimizes adaptive weights, adjusts duplicate-search radius, computes a 0–100 civic index, and saves daily JSON snapshots; scheduler script invokes this flow.
Trend Analysis
backend/trend_analyzer.py
New TrendAnalyzer singleton that extracts top keywords, counts categories, detects spikes against historical averages, and identifies geographic clusters (uses cluster_issues_dbscan from backend.spatial_utils).
Priority Engine Refactor
backend/priority_engine.py
Removed hard-coded keyword/pattern/category constants; added properties that delegate to adaptive_weights (severity_keywords, urgency_patterns via get_urgency_patterns_tuples(), and categories).
Data & Config
backend/data/modelWeights.json, .gitignore, backend/requirements-render.txt
Added JSON weights dataset; updated .gitignore to ignore data/dailySnapshots/ and data/modelWeights.json; added scikit-learn and numpy to backend requirements.

Sequence Diagram

sequenceDiagram
    participant Scheduler
    participant Engine as CivicIntelligenceEngine
    participant DB as Database
    participant Analyzer as TrendAnalyzer
    participant Weights as AdaptiveWeights
    participant Priority as PriorityEngine

    Scheduler->>Engine: refine_daily(db)
    Engine->>DB: fetch issues & grievances (last 24h)
    Engine->>Engine: _load_last_snapshot()
    Engine->>Analyzer: analyze(issues, historical_counts)
    Analyzer-->>Engine: trends & clusters
    Engine->>Weights: _optimize_weights(grievances)
    Weights-->>Engine: weight_updates
    Engine->>Weights: _adjust_duplicate_threshold(issues)
    Engine->>Priority: compute severity/urgency (reads adaptive weights)
    Engine->>Engine: _calculate_index(issues, grievances)
    Engine->>Engine: _save_snapshot(snapshot)
    Engine-->>Scheduler: return index_score & metadata
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested Labels

ECWoC26-L3

Poem

🐰 Hopping through JSON, defaults in tow,

I nibble the keywords so insights may grow.
Trends bloom at dawn, snapshots snug and bright,
Weights learn each night and tune by morning light.
A little rabbit cheers: refine and take flight! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.72% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Daily Civic Intelligence Refinement Engine' clearly and specifically summarizes the main feature added in the changeset.
Description check ✅ Passed The PR description is comprehensive and addresses multiple key aspects: provides detailed context of new components, explains their purpose, describes the refactoring of PriorityEngine, and includes deployment considerations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch civic-intelligence-engine-10394596499515001806

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (8)
backend/civic_intelligence.py (2)

195-199: Use logger.exception for richer tracebacks in error paths.

Per the Ruff TRY400 hint, logger.error(...) on line 199 won't include the full traceback by default. logger.exception(...) automatically attaches the traceback within an except block. Same applies to _load_last_snapshot (line 214).

Proposed fix
         except Exception as e:
-            logger.error(f"Failed to save snapshot: {e}")
+            logger.exception(f"Failed to save snapshot: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/civic_intelligence.py` around lines 195 - 199, Replace the error
logging calls in the snapshot persistence code to capture tracebacks: inside the
except blocks in the functions handling snapshots (the write block that
currently logs "Failed to save snapshot" and the corresponding
_load_last_snapshot error path), call logger.exception(...) instead of
logger.error(...) so the exception traceback is included while preserving the
existing message text.

128-130: Calling private method _extract_top_keywords from outside TrendAnalyzer.

trend_analyzer._extract_top_keywords(...) reaches into a private method. Consider making this a public method (rename to extract_top_keywords) or adding a dedicated public API on TrendAnalyzer for keyword extraction from an arbitrary subset of issues.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/civic_intelligence.py` around lines 128 - 130, The code calls the
private method trend_analyzer._extract_top_keywords(critical_issues, top_n=10);
change this to use a public API on TrendAnalyzer by either renaming the method
to extract_top_keywords or adding a new public method (e.g.,
TrendAnalyzer.extract_top_keywords(issues, top_n)) that accepts an arbitrary
list like critical_issues and returns top_keywords; update the call site to use
the new public method and remove any direct references to _extract_top_keywords
to respect encapsulation.
backend/data/modelWeights.json (1)

2-218: Duplicate keywords across severity tiers create dead entries.

Several keywords appear in multiple severity tiers — e.g., "smoke" in both critical (Line 51) and high (Line 66), "attack" in both critical (Line 19) and high (Line 81), "dirty" in medium and low, "leaning" in high and low. Because _calculate_severity in PriorityEngine short-circuits at the highest matching tier, duplicates in lower tiers will never influence scoring. They add maintenance overhead and can mislead anyone editing the weights.

Consider deduplicating across tiers so each keyword appears only at its intended severity level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/data/modelWeights.json` around lines 2 - 218, The severity_keywords
mapping contains duplicate entries across tiers which never get used because
PriorityEngine._calculate_severity short-circuits at the highest matching tier;
remove duplicates so each keyword appears only once at its intended severity
level (edit the "severity_keywords" object in modelWeights.json), audit for
repeated tokens like "smoke", "attack", "dirty", "leaning", and ensure each term
is assigned to the correct tier to avoid misleading maintainers and dead entries
used by PriorityEngine._calculate_severity.
backend/adaptive_weights.py (3)

118-132: _check_reload is not thread-safe — concurrent reads and writes can interleave.

The hot-reload mechanism (_check_reloadload_weights) can fire during a web request while the scheduler is simultaneously calling save_weights or update_severity_keyword. There is no locking around the read-modify-write sequences. Under the GIL this won't segfault, but a PriorityEngine.analyze() call could see partially updated state (e.g., new severity keywords but old urgency patterns from a stale load).

If the API server uses threads (e.g., gunicorn with --threads), consider adding a threading.Lock around load/save/property access.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/adaptive_weights.py` around lines 118 - 132, The
_check_reload/load_weights path is not thread-safe: add a threading.Lock (e.g.,
self._rw_lock) to adaptive_weights.py and use it to serialize access to shared
state—acquire the lock around load_weights(), save_weights(),
update_severity_keyword(), and any property getters or methods used by
PriorityEngine.analyze() that read internal structures; ensure _check_reload
acquires the lock before checking mtime and performing load to prevent
interleaved read/write, and release the lock promptly to avoid blocking.

131-132: Use logger.exception instead of logger.error in except blocks.

Per Ruff TRY400, logger.error(f"...: {e}") loses the full traceback. logger.exception(...) automatically includes it, which is valuable for diagnosing file I/O failures in production.

Proposed fix (example for line 132)
             except Exception as e:
-                logger.error(f"Error checking weights file modification: {e}")
+                logger.exception(f"Error checking weights file modification: {e}")

Apply the same pattern at lines 148 and 172.

Also applies to: 147-148, 171-172

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/adaptive_weights.py` around lines 131 - 132, Replace the logger.error
calls inside the three except Exception as e blocks with logger.exception so the
full traceback is recorded; locate the three try/except handlers in
backend/adaptive_weights.py that currently do `except Exception as e:
logger.error(f"Error ...: {e}")` and change them to `logger.exception("Error
...")` (keep the original descriptive message) so the logger variable used in
this module captures the stack trace for those file I/O/weight-check failures.

174-197: Properties return mutable internal state — callers can accidentally corrupt weights.

severity_keywords, urgency_patterns, and categories properties all return direct references to internal lists/dicts. Any caller doing e.g. adaptive_weights.severity_keywords["critical"].append("test") would silently mutate the weights without triggering save_weights(), creating a divergence between in-memory and on-disk state.

Consider returning copies from the property getters, or documenting that callers must treat the returned data as read-only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/adaptive_weights.py` around lines 174 - 197, The property getters
severity_keywords, urgency_patterns, and categories currently return direct
references to internal mutable state (_severity_keywords, _urgency_patterns,
_categories) so callers can mutate in-memory weights without triggering
save_weights; change each getter to return a defensive copy (deep copy for
nested lists/dicts) after calling _check_reload() so external mutations don't
affect internal state, keeping the existing _check_reload() call; no change to
duplicate_search_radius behavior is needed aside from leaving its setter that
calls save_weights() as-is.
backend/trend_analyzer.py (2)

22-22: Use explicit Optional for the default-None parameter.

Per PEP 484 (and flagged by Ruff RUF013), Dict[str, float] = None is an implicit Optional. Use the explicit form.

Proposed fix
-    def analyze(self, issues: List[Issue], historical_category_counts: Dict[str, float] = None) -> Dict[str, Any]:
+    def analyze(self, issues: List[Issue], historical_category_counts: Optional[Dict[str, float]] = None) -> Dict[str, Any]:

And add Optional to the import on line 2:

-from typing import List, Dict, Any, Tuple
+from typing import List, Dict, Any, Tuple, Optional
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/trend_analyzer.py` at line 22, Update the analyze method signature to
use an explicit Optional for the default-None parameter: change the type of
historical_category_counts from Dict[str, float] = None to Optional[Dict[str,
float]] = None in the analyze method definition, and add Optional to the type
imports at the top of the file so the symbol Optional is available for use.

46-52: Keyword extraction is limited to Latin [a-z] characters only.

The regex r'\b[a-z]{3,}\b' will silently drop all non-Latin tokens (Hindi, Tamil, etc.). Given a civic platform likely serving diverse populations, this could miss the majority of keywords. Worth noting as a known limitation and addressing in a follow-up if multilingual support is expected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/trend_analyzer.py` around lines 46 - 52, The regex in
_extract_top_keywords currently uses r'\b[a-z]{3,}\b' which only matches Latin
a–z and drops non-Latin tokens; change the extraction to use a Unicode-aware
pattern (e.g., match word characters or Unicode letters) and pass re.UNICODE,
and normalize case with .casefold() instead of .lower() so tokens like
Hindi/Tamil are retained and compared consistently; keep filtering via
STOP_WORDS and Counter(filtered_words).most_common(top_n), and consider that
STOP_WORDS may need localization in a future follow-up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/adaptive_weights.py`:
- Around line 105-116: The constructor (__init__) currently uses shallow copies
(DEFAULT_SEVERITY_KEYWORDS.copy(), DEFAULT_URGENCY_PATTERNS.copy(),
DEFAULT_CATEGORIES.copy()) which leaves nested mutable objects shared; replace
these with deep copies so mutations to self._severity_keywords,
self._urgency_patterns, and self._categories don’t mutate the module-level
DEFAULT_* constants — import deepcopy from the copy module and initialize those
three attributes with deepcopy(DEFAULT_SEVERITY_KEYWORDS),
deepcopy(DEFAULT_URGENCY_PATTERNS), and deepcopy(DEFAULT_CATEGORIES)
respectively, leaving the rest of __init__ (last_modified, last_check_time,
check_interval, load_weights()) unchanged.

In `@backend/civic_intelligence.py`:
- Line 17: SNAPSHOT_DIR and the AdaptiveWeights data_path are currently relative
and can break when the process runs from a different CWD; update them to use an
absolute path by resolving relative paths against the module file location
(using __file__) or by reading a configurable environment variable (e.g.,
DATA_DIR) with a fallback to the anchored path; change the SNAPSHOT_DIR
definition and the AdaptiveWeights constructor/attribute that sets data_path to
compute os.path.join(os.path.dirname(__file__), ...) or
os.environ.get("DATA_DIR", ...anchored path...) so all file reads/writes use a
deterministic absolute path.
- Around line 43-47: The code currently passes the previous day's raw
category_counts (from previous_snapshot returned by _load_last_snapshot()) into
trend_analyzer.analyze(), which causes _detect_spikes to compare today against a
single day; instead, change the flow to load the last N snapshots (e.g., 7) and
compute per-category rolling averages before calling
trend_analyzer.analyze(issues_24h, historical_averages). Concretely: replace the
single-call to self._load_last_snapshot() with a call that retrieves the last N
snapshots (implement or reuse a helper like _load_last_n_snapshots(n) or extend
_load_last_snapshot to accept N), compute the mean category_counts across those
snapshots into historical_averages, and pass that dictionary to
trend_analyzer.analyze so _detect_spikes receives true historical averages (make
N configurable, default 7).
- Around line 86-88: The code in _adjust_duplicate_threshold uses i.status ==
'duplicate' which never matches because Issue.status only contains OPEN,
VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED; update the duplicate detection to use
the actual field used for marking duplicates (e.g., if you have an
Issue.resolution or Issue.is_duplicate flag, change the check to i.resolution ==
'DUPLICATE' or i.is_duplicate) or, if you intend to make duplicate a status, add
'DUPLICATE' to the Issue.status enum/schema and add the necessary migration and
tests; update the reference in _adjust_duplicate_threshold (and any other places
using 'duplicate') to the chosen canonical representation so duplicates are
counted correctly.

In `@backend/scheduler/daily_refinement_job.py`:
- Around line 29-30: The except block currently logs the error with
logger.error(f"Daily refinement job failed: {e}", exc_info=True) but swallows
the exception, causing the process to exit with code 0; modify the handler to
terminate with a non-zero exit (e.g., call sys.exit(1)) or re-raise the
exception after logging so external schedulers detect failure—update the except
Exception as e block in daily_refinement_job.py (the block using logger.error
and the literal "Daily refinement job failed") to either call sys.exit(1) after
logging or simply raise the caught exception.

In `@backend/trend_analyzer.py`:
- Line 79: The filter that builds located_issues currently uses a falsy check
(i.latitude and i.longitude) which will drop valid 0.0 coordinates; update the
comprehension to explicitly check for non-None values (e.g., i.latitude is not
None and i.longitude is not None) so zero coordinates are retained—modify the
located_issues assignment in trend_analyzer.py to use the explicit None checks
against the Issue objects in the issues list.

---

Nitpick comments:
In `@backend/adaptive_weights.py`:
- Around line 118-132: The _check_reload/load_weights path is not thread-safe:
add a threading.Lock (e.g., self._rw_lock) to adaptive_weights.py and use it to
serialize access to shared state—acquire the lock around load_weights(),
save_weights(), update_severity_keyword(), and any property getters or methods
used by PriorityEngine.analyze() that read internal structures; ensure
_check_reload acquires the lock before checking mtime and performing load to
prevent interleaved read/write, and release the lock promptly to avoid blocking.
- Around line 131-132: Replace the logger.error calls inside the three except
Exception as e blocks with logger.exception so the full traceback is recorded;
locate the three try/except handlers in backend/adaptive_weights.py that
currently do `except Exception as e: logger.error(f"Error ...: {e}")` and change
them to `logger.exception("Error ...")` (keep the original descriptive message)
so the logger variable used in this module captures the stack trace for those
file I/O/weight-check failures.
- Around line 174-197: The property getters severity_keywords, urgency_patterns,
and categories currently return direct references to internal mutable state
(_severity_keywords, _urgency_patterns, _categories) so callers can mutate
in-memory weights without triggering save_weights; change each getter to return
a defensive copy (deep copy for nested lists/dicts) after calling
_check_reload() so external mutations don't affect internal state, keeping the
existing _check_reload() call; no change to duplicate_search_radius behavior is
needed aside from leaving its setter that calls save_weights() as-is.

In `@backend/civic_intelligence.py`:
- Around line 195-199: Replace the error logging calls in the snapshot
persistence code to capture tracebacks: inside the except blocks in the
functions handling snapshots (the write block that currently logs "Failed to
save snapshot" and the corresponding _load_last_snapshot error path), call
logger.exception(...) instead of logger.error(...) so the exception traceback is
included while preserving the existing message text.
- Around line 128-130: The code calls the private method
trend_analyzer._extract_top_keywords(critical_issues, top_n=10); change this to
use a public API on TrendAnalyzer by either renaming the method to
extract_top_keywords or adding a new public method (e.g.,
TrendAnalyzer.extract_top_keywords(issues, top_n)) that accepts an arbitrary
list like critical_issues and returns top_keywords; update the call site to use
the new public method and remove any direct references to _extract_top_keywords
to respect encapsulation.

In `@backend/data/modelWeights.json`:
- Around line 2-218: The severity_keywords mapping contains duplicate entries
across tiers which never get used because PriorityEngine._calculate_severity
short-circuits at the highest matching tier; remove duplicates so each keyword
appears only once at its intended severity level (edit the "severity_keywords"
object in modelWeights.json), audit for repeated tokens like "smoke", "attack",
"dirty", "leaning", and ensure each term is assigned to the correct tier to
avoid misleading maintainers and dead entries used by
PriorityEngine._calculate_severity.

In `@backend/trend_analyzer.py`:
- Line 22: Update the analyze method signature to use an explicit Optional for
the default-None parameter: change the type of historical_category_counts from
Dict[str, float] = None to Optional[Dict[str, float]] = None in the analyze
method definition, and add Optional to the type imports at the top of the file
so the symbol Optional is available for use.
- Around line 46-52: The regex in _extract_top_keywords currently uses
r'\b[a-z]{3,}\b' which only matches Latin a–z and drops non-Latin tokens; change
the extraction to use a Unicode-aware pattern (e.g., match word characters or
Unicode letters) and pass re.UNICODE, and normalize case with .casefold()
instead of .lower() so tokens like Hindi/Tamil are retained and compared
consistently; keep filtering via STOP_WORDS and
Counter(filtered_words).most_common(top_n), and consider that STOP_WORDS may
need localization in a future follow-up.

Comment on lines 105 to 116
def __init__(self, data_path: str = "backend/data/modelWeights.json"):
self.data_path = data_path
self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy()
self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy()
self._categories = DEFAULT_CATEGORIES.copy()
self._duplicate_search_radius = 50.0

self.last_modified = 0.0
self.last_check_time = 0.0
self.check_interval = 5.0 # Check every 5 seconds at most

self.load_weights()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Shallow .copy() on defaults shares nested mutable objects — mutations corrupt the module-level constants.

DEFAULT_SEVERITY_KEYWORDS.copy() (line 107) creates a shallow copy: the outer dict is new, but the inner lists (e.g., DEFAULT_SEVERITY_KEYWORDS["critical"]) are shared references. When update_severity_keyword appends to self._severity_keywords[level], it mutates the original default list in-place. This corrupts DEFAULT_SEVERITY_KEYWORDS for the lifetime of the process.

The same applies to DEFAULT_URGENCY_PATTERNS (line 108) and DEFAULT_CATEGORIES (line 109).

In practice the JSON file usually exists and load_weights replaces these references, but the bug surfaces when the file is missing (first run / tests).

Proposed fix — use deep copies
+import copy
+
 class AdaptiveWeights:
     ...
     def __init__(self, data_path: str = "backend/data/modelWeights.json"):
         self.data_path = data_path
-        self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy()
-        self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy()
-        self._categories = DEFAULT_CATEGORIES.copy()
+        self._severity_keywords = copy.deepcopy(DEFAULT_SEVERITY_KEYWORDS)
+        self._urgency_patterns = copy.deepcopy(DEFAULT_URGENCY_PATTERNS)
+        self._categories = copy.deepcopy(DEFAULT_CATEGORIES)
         self._duplicate_search_radius = 50.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, data_path: str = "backend/data/modelWeights.json"):
self.data_path = data_path
self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy()
self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy()
self._categories = DEFAULT_CATEGORIES.copy()
self._duplicate_search_radius = 50.0
self.last_modified = 0.0
self.last_check_time = 0.0
self.check_interval = 5.0 # Check every 5 seconds at most
self.load_weights()
def __init__(self, data_path: str = "backend/data/modelWeights.json"):
self.data_path = data_path
self._severity_keywords = copy.deepcopy(DEFAULT_SEVERITY_KEYWORDS)
self._urgency_patterns = copy.deepcopy(DEFAULT_URGENCY_PATTERNS)
self._categories = copy.deepcopy(DEFAULT_CATEGORIES)
self._duplicate_search_radius = 50.0
self.last_modified = 0.0
self.last_check_time = 0.0
self.check_interval = 5.0 # Check every 5 seconds at most
self.load_weights()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/adaptive_weights.py` around lines 105 - 116, The constructor
(__init__) currently uses shallow copies (DEFAULT_SEVERITY_KEYWORDS.copy(),
DEFAULT_URGENCY_PATTERNS.copy(), DEFAULT_CATEGORIES.copy()) which leaves nested
mutable objects shared; replace these with deep copies so mutations to
self._severity_keywords, self._urgency_patterns, and self._categories don’t
mutate the module-level DEFAULT_* constants — import deepcopy from the copy
module and initialize those three attributes with
deepcopy(DEFAULT_SEVERITY_KEYWORDS), deepcopy(DEFAULT_URGENCY_PATTERNS), and
deepcopy(DEFAULT_CATEGORIES) respectively, leaving the rest of __init__
(last_modified, last_check_time, check_interval, load_weights()) unchanged.

Comment on lines +43 to +47
# 2. Trend Analysis
previous_snapshot = self._load_last_snapshot()
historical_counts = previous_snapshot.get("trends", {}).get("category_counts", {}) if previous_snapshot else {}

trends = trend_analyzer.analyze(issues_24h, historical_counts)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Spike detection compares against a single previous day, not a true historical average.

historical_counts on line 45 is pulled from the previous day's snapshot category_counts, which are raw counts for that one day. However, _detect_spikes treats its second argument as historical_averages and applies a > 1.5× threshold. Comparing today against a single prior day is noisy — one anomalous day followed by a normal day would register as a "spike."

Consider computing a rolling average over the last N snapshots (e.g., 7 days) for more meaningful spike detection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/civic_intelligence.py` around lines 43 - 47, The code currently
passes the previous day's raw category_counts (from previous_snapshot returned
by _load_last_snapshot()) into trend_analyzer.analyze(), which causes
_detect_spikes to compare today against a single day; instead, change the flow
to load the last N snapshots (e.g., 7) and compute per-category rolling averages
before calling trend_analyzer.analyze(issues_24h, historical_averages).
Concretely: replace the single-call to self._load_last_snapshot() with a call
that retrieves the last N snapshots (implement or reuse a helper like
_load_last_n_snapshots(n) or extend _load_last_snapshot to accept N), compute
the mean category_counts across those snapshots into historical_averages, and
pass that dictionary to trend_analyzer.analyze so _detect_spikes receives true
historical averages (make N configurable, default 7).

Comment on lines +86 to +88
# Assuming 'duplicate' status indicates a duplicate issue
duplicates = len([i for i in issues if i.status == 'duplicate'])
rate = duplicates / total
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the models file and examine Issue class
fd -t f "models.py" backend/ && cat backend/models.py

Repository: RohanExploit/VishwaGuru

Length of output: 8668


🏁 Script executed:

# Search for Issue class definition and status field
rg -n "class Issue" backend/ -A 20

Repository: RohanExploit/VishwaGuru

Length of output: 9974


🏁 Script executed:

# Check for enum definitions like GrievanceStatus
rg -n "GrievanceStatus\|class.*Status.*Enum" backend/

Repository: RohanExploit/VishwaGuru

Length of output: 49


🏁 Script executed:

# Read the civic_intelligence.py file focusing on the _adjust_duplicate_threshold function
cat -n backend/civic_intelligence.py | sed -n '75,100p'

Repository: RohanExploit/VishwaGuru

Length of output: 1213


🏁 Script executed:

# Search for any references to 'duplicate' in the codebase
rg -n "duplicate" backend/ -i

Repository: RohanExploit/VishwaGuru

Length of output: 2253


Fix status comparison — 'duplicate' is not a valid Issue status.

i.status == 'duplicate' will silently never match because the Issue.status field is a plain string column, but 'duplicate' is not a valid status value in the system. Valid statuses are: OPEN, VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED. This makes _adjust_duplicate_threshold a permanent no-op—the duplicates count will always be 0, and the threshold adjustment logic will never trigger.

Either add 'duplicate' as a valid status option or rethink how duplicate detection integrates with status tracking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/civic_intelligence.py` around lines 86 - 88, The code in
_adjust_duplicate_threshold uses i.status == 'duplicate' which never matches
because Issue.status only contains OPEN, VERIFIED, ASSIGNED, IN_PROGRESS,
RESOLVED; update the duplicate detection to use the actual field used for
marking duplicates (e.g., if you have an Issue.resolution or Issue.is_duplicate
flag, change the check to i.resolution == 'DUPLICATE' or i.is_duplicate) or, if
you intend to make duplicate a status, add 'DUPLICATE' to the Issue.status
enum/schema and add the necessary migration and tests; update the reference in
_adjust_duplicate_threshold (and any other places using 'duplicate') to the
chosen canonical representation so duplicates are counted correctly.

Comment on lines +29 to +30
except Exception as e:
logger.error(f"Daily refinement job failed: {e}", exc_info=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Swallowed exception means the process exits with code 0 on failure.

The except block logs the error but doesn't re-raise or call sys.exit(1). Any external scheduler (cron, systemd timer, Kubernetes CronJob) will see a successful exit and won't trigger failure alerts.

Proposed fix
     except Exception as e:
         logger.error(f"Daily refinement job failed: {e}", exc_info=True)
+        raise
     finally:
         db.close()

Alternatively, if you want a clean exit code without a traceback on stderr, use sys.exit(1) after logging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scheduler/daily_refinement_job.py` around lines 29 - 30, The except
block currently logs the error with logger.error(f"Daily refinement job failed:
{e}", exc_info=True) but swallows the exception, causing the process to exit
with code 0; modify the handler to terminate with a non-zero exit (e.g., call
sys.exit(1)) or re-raise the exception after logging so external schedulers
detect failure—update the except Exception as e block in daily_refinement_job.py
(the block using logger.error and the literal "Daily refinement job failed") to
either call sys.exit(1) after logging or simply raise the caught exception.

Uses spatial utilities to find clusters of issues.
"""
# Filter issues with location
located_issues = [i for i in issues if i.latitude and i.longitude]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Falsy check on coordinates will exclude valid 0.0 values.

i.latitude and i.longitude evaluates to False when either coordinate is 0.0 (equator or prime meridian), which are valid geographic positions. The Issue model defines these as optional nullable floats, so the correct check is is not None.

Proposed fix
-        located_issues = [i for i in issues if i.latitude and i.longitude]
+        located_issues = [i for i in issues if i.latitude is not None and i.longitude is not None]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
located_issues = [i for i in issues if i.latitude and i.longitude]
located_issues = [i for i in issues if i.latitude is not None and i.longitude is not None]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/trend_analyzer.py` at line 79, The filter that builds located_issues
currently uses a falsy check (i.latitude and i.longitude) which will drop valid
0.0 coordinates; update the comprehension to explicitly check for non-None
values (e.g., i.latitude is not None and i.longitude is not None) so zero
coordinates are retained—modify the located_issues assignment in
trend_analyzer.py to use the explicit None checks against the Issue objects in
the issues list.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a daily civic intelligence refinement engine that enables the VishwaGuru system to self-improve by analyzing patterns in civic issues and automatically adjusting classification weights and duplicate detection thresholds. The system introduces dynamic, hot-reloadable configuration to replace hardcoded priority engine weights.

Changes:

  • Introduced AdaptiveWeights singleton for hot-reloading severity keywords, urgency patterns, categories, and duplicate detection radius from JSON
  • Created TrendAnalyzer for extracting keywords, detecting category spikes, and identifying geographic clusters using DBSCAN
  • Implemented CivicIntelligenceEngine to orchestrate daily refinement including Civic Intelligence Index calculation, automatic weight updates based on critical issues, and duplicate threshold adjustment
  • Modified PriorityEngine to use dynamic weights from AdaptiveWeights instead of hardcoded dictionaries
  • Added scheduler script for running daily refinement job

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
backend/adaptive_weights.py Implements hot-reloadable configuration management with file-watching and JSON persistence
backend/civic_intelligence.py Core daily refinement logic including trend analysis, weight optimization, and index calculation
backend/trend_analyzer.py Analyzes issues for keyword trends, category spikes, and geographic clusters
backend/priority_engine.py Refactored to use dynamic weights from AdaptiveWeights singleton via properties
backend/scheduler/daily_refinement_job.py Standalone script to execute daily refinement with database session management
backend/data/modelWeights.json JSON configuration storing all dynamic weights and thresholds

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 98 to 239
class AdaptiveWeights:
"""
Manages dynamic weights for the PriorityEngine.
Loads from/saves to a JSON file.
Automatically reloads if the file changes.
"""

def __init__(self, data_path: str = "backend/data/modelWeights.json"):
self.data_path = data_path
self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy()
self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy()
self._categories = DEFAULT_CATEGORIES.copy()
self._duplicate_search_radius = 50.0

self.last_modified = 0.0
self.last_check_time = 0.0
self.check_interval = 5.0 # Check every 5 seconds at most

self.load_weights()

def _check_reload(self):
"""Checks if file modified and reloads if needed."""
now = time.time()
if now - self.last_check_time < self.check_interval:
return

self.last_check_time = now
if os.path.exists(self.data_path):
try:
mtime = os.path.getmtime(self.data_path)
if mtime > self.last_modified:
logger.info(f"Weights file changed. Reloading from {self.data_path}")
self.load_weights()
except Exception as e:
logger.error(f"Error checking weights file modification: {e}")

def load_weights(self):
"""Loads weights from the JSON file if it exists."""
if os.path.exists(self.data_path):
try:
with open(self.data_path, 'r') as f:
data = json.load(f)
self._severity_keywords = data.get("severity_keywords", self._severity_keywords)
self._urgency_patterns = data.get("urgency_patterns", self._urgency_patterns)
self._categories = data.get("categories", self._categories)
self._duplicate_search_radius = data.get("duplicate_search_radius", self._duplicate_search_radius)

self.last_modified = os.path.getmtime(self.data_path)
logger.info(f"Loaded weights from {self.data_path}")
except Exception as e:
logger.error(f"Failed to load weights from {self.data_path}: {e}")
else:
logger.info(f"No existing weights file found at {self.data_path}. Using defaults.")
self.save_weights()

def save_weights(self):
"""Saves current weights to the JSON file."""
data = {
"severity_keywords": self._severity_keywords,
"urgency_patterns": self._urgency_patterns,
"categories": self._categories,
"duplicate_search_radius": self._duplicate_search_radius
}
try:
# Ensure directory exists
os.makedirs(os.path.dirname(self.data_path), exist_ok=True)

with open(self.data_path, 'w') as f:
json.dump(data, f, indent=4)

# Update last_modified so we don't reload our own write immediately (or we do, which is fine)
self.last_modified = os.path.getmtime(self.data_path)
logger.info(f"Saved weights to {self.data_path}")
except Exception as e:
logger.error(f"Failed to save weights to {self.data_path}: {e}")

@property
def severity_keywords(self) -> Dict[str, List[str]]:
self._check_reload()
return self._severity_keywords

@property
def urgency_patterns(self) -> List[List[Any]]:
self._check_reload()
return self._urgency_patterns

@property
def categories(self) -> Dict[str, List[str]]:
self._check_reload()
return self._categories

@property
def duplicate_search_radius(self) -> float:
self._check_reload()
return self._duplicate_search_radius

@duplicate_search_radius.setter
def duplicate_search_radius(self, value: float):
self._duplicate_search_radius = value
self.save_weights()

def update_severity_keyword(self, severity_level: str, keyword: str, add: bool = True):
self._check_reload() # Ensure we have latest before updating
if severity_level not in self._severity_keywords:
return

keywords = self._severity_keywords[severity_level]
if add:
if keyword not in keywords:
keywords.append(keyword)
else:
if keyword in keywords:
keywords.remove(keyword)

self.save_weights()

def update_category_keyword(self, category: str, keyword: str, add: bool = True):
self._check_reload()
if category not in self._categories:
if add:
self._categories[category] = [keyword]
else:
return
else:
keywords = self._categories[category]
if add:
if keyword not in keywords:
keywords.append(keyword)
else:
if keyword in keywords:
keywords.remove(keyword)

self.save_weights()

def get_urgency_patterns_tuples(self) -> List[Tuple[str, int]]:
"""Returns urgency patterns as tuples (required by PriorityEngine)."""
# Call property to trigger reload check
patterns = self.urgency_patterns
return [(p[0], p[1]) for p in patterns]

# Singleton instance
adaptive_weights = AdaptiveWeights()
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for the new adaptive_weights module. Given that other modules like priority_engine, spatial_utils, and various detectors have comprehensive test coverage in backend/tests/, the adaptive_weights module should have tests covering hot-reloading, file I/O, weight updates, and error conditions.

Copilot uses AI. Check for mistakes.
def _load_last_snapshot(self) -> Dict[str, Any]:
"""Loads the most recent snapshot."""
if not os.path.exists(SNAPSHOT_DIR):
return {}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot directory is hardcoded as "backend/data/dailySnapshots" but there's no guarantee this directory exists. While _save_snapshot creates it on demand (line 192-193), _load_last_snapshot simply checks os.path.exists and returns an empty dict if it doesn't exist. This is inconsistent and could lead to unexpected behavior. Consider creating the directory on initialization or documenting that it needs to be created externally.

Suggested change
return {}
try:
os.makedirs(SNAPSHOT_DIR, exist_ok=True)
except OSError as e:
logger.error(f"Failed to create snapshot directory '{SNAPSHOT_DIR}': {e}")
return {}

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +10
# backend/scheduler/daily_refinement_job.py -> backend/scheduler -> backend -> root
repo_root = current_file.parent.parent.parent
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path manipulation assumes a specific directory structure (backend/scheduler/daily_refinement_job.py -> backend/scheduler -> backend -> root). If the script is symlinked or the directory structure changes, this will break. Consider using a more robust approach like finding the backend package directory or using environment variables to specify the root path.

Suggested change
# backend/scheduler/daily_refinement_job.py -> backend/scheduler -> backend -> root
repo_root = current_file.parent.parent.parent
# Try to infer the repository root dynamically by locating the `backend` package.
repo_root = None
for parent in current_file.parents:
backend_pkg = parent / "backend" / "__init__.py"
if backend_pkg.is_file():
repo_root = parent
break
# Fallback to environment variable if `backend` package was not found.
if repo_root is None:
env_root = os.getenv("PROJECT_ROOT")
if env_root:
repo_root = Path(env_root).resolve()
# Final fallback to preserve legacy behavior (assumes fixed directory structure).
if repo_root is None:
repo_root = current_file.parent.parent.parent

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +106
def _adjust_duplicate_threshold(self, issues: List[Issue]):
"""
Dynamically adjusts duplicate detection threshold based on duplicate rate.
"""
total = len(issues)
if total < 10:
return

# Assuming 'duplicate' status indicates a duplicate issue
duplicates = len([i for i in issues if i.status == 'duplicate'])
rate = duplicates / total

current_radius = adaptive_weights.duplicate_search_radius

# Simple heuristic:
# If too many duplicates (> 30%), we might be over-matching (radius too big).
# If too few (< 5%), we might be under-matching (radius too small).
# This is a naive control loop but satisfies the requirement.

if rate > 0.3:
new_radius = max(10.0, current_radius * 0.9)
if new_radius != current_radius:
adaptive_weights.duplicate_search_radius = new_radius
logger.info(f"High duplicate rate ({rate:.2f}). Decreased duplicate search radius to {new_radius:.1f}m")
elif rate < 0.05:
new_radius = min(100.0, current_radius * 1.1)
if new_radius != current_radius:
adaptive_weights.duplicate_search_radius = new_radius
logger.info(f"Low duplicate rate ({rate:.2f}). Increased duplicate search radius to {new_radius:.1f}m")
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate threshold adjustment logic modifies the radius but doesn't provide any mechanism to track its effectiveness over time or prevent oscillation. If the rate alternates around the thresholds, the radius could oscillate between increasing and decreasing on successive days. Consider implementing a moving average or dampening factor to stabilize the adjustments.

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +133
def _check_reload(self):
"""Checks if file modified and reloads if needed."""
now = time.time()
if now - self.last_check_time < self.check_interval:
return

self.last_check_time = now
if os.path.exists(self.data_path):
try:
mtime = os.path.getmtime(self.data_path)
if mtime > self.last_modified:
logger.info(f"Weights file changed. Reloading from {self.data_path}")
self.load_weights()
except Exception as e:
logger.error(f"Error checking weights file modification: {e}")

Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hot-reload mechanism checks file modification every 5 seconds via property accessors, but there's a race condition: if save_weights() is called by the daily job while an API request is reading the properties, the file could be partially written when read. Additionally, the os.path.getmtime() check happens before opening the file, creating a TOCTTOU (Time-Of-Check-Time-Of-Use) vulnerability where the file could be modified between the mtime check and the actual read.

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +87
# Assuming 'duplicate' status indicates a duplicate issue
duplicates = len([i for i in issues if i.status == 'duplicate'])
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assumes 'duplicate' is a valid status value for issues, but according to IssueStatus enum in schemas.py (lines 19-24), only 'open', 'verified', 'assigned', 'in_progress', and 'resolved' are valid statuses. The Issue model uses a String column for status (not an enum), but the assumption that 'duplicate' exists appears incorrect based on the established schema conventions. This will result in a duplicate rate of 0% always, causing the threshold adjustment logic to never function correctly.

Suggested change
# Assuming 'duplicate' status indicates a duplicate issue
duplicates = len([i for i in issues if i.status == 'duplicate'])
# Determine which statuses represent duplicates, if configured.
duplicate_statuses = getattr(Issue, "DUPLICATE_STATUSES", None)
if not duplicate_statuses:
# No explicit duplicate status configured; skip adjustment to avoid
# relying on unsupported status values.
logger.debug("No duplicate statuses configured for Issue; skipping duplicate threshold adjustment.")
return
duplicates = len([i for i in issues if i.status in duplicate_statuses])

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +32
db: Session = SessionLocal()
try:
result = civic_intelligence_engine.refine_daily(db)
logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}")
except Exception as e:
logger.error(f"Daily refinement job failed: {e}", exc_info=True)
finally:
db.close()
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database session is not properly closed in case of exception. While there's a finally block that calls db.close(), if an exception occurs during civic_intelligence_engine.refine_daily(db), the session may be left in an inconsistent state. Consider using a context manager (with SessionLocal() as db:) to ensure proper cleanup.

Suggested change
db: Session = SessionLocal()
try:
result = civic_intelligence_engine.refine_daily(db)
logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}")
except Exception as e:
logger.error(f"Daily refinement job failed: {e}", exc_info=True)
finally:
db.close()
with SessionLocal() as db:
try:
result = civic_intelligence_engine.refine_daily(db)
logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}")
except Exception as e:
logger.error(f"Daily refinement job failed: {e}", exc_info=True)

Copilot uses AI. Check for mistakes.

# Use DBSCAN via spatial_utils
# Default eps=30m might be too small for "geographic clustering" of trends.
# Let's use larger radius like 500m to find hotspots.
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DBSCAN clustering uses eps_meters=500.0 which is approximately converted to degrees (line 173 in spatial_utils.py uses eps_meters / 111000). However, the comment on line 86 says this is a "larger radius like 500m to find hotspots" but doesn't account for latitude variation where longitude degree distances vary with cos(latitude). Consider documenting this limitation or using the haversine metric consistently which DBSCAN already supports (line 177 in spatial_utils.py).

Suggested change
# Let's use larger radius like 500m to find hotspots.
# Use a larger nominal radius (around 500m) to find hotspots. Note: spatial_utils
# converts eps_meters to degrees with a simple 1°≈111km approximation, so the
# effective ground distance, especially in longitude, will vary with latitude.

Copilot uses AI. Check for mistakes.
Comment on lines 12 to +14
def __init__(self):
# Keyword dictionaries for Severity Classification
self.severity_keywords = {
"critical": [
"fire", "explosion", "blood", "death", "collapse", "gas leak",
"electric shock", "spark", "electrocution", "drowning",
"flood", "landslide", "earthquake", "cyclone", "hurricane",
"attack", "assault", "rabid", "deadly", "fatal", "emergency",
"blocked road", "ambulance", "hospital", "school", "child",
"exposed wire", "transformer", "chemical", "toxic", "poison",
"weapon", "gun", "bomb", "terror", "riot", "stampede",
"structural failure", "pillar", "bridge", "flyover",
"open manhole", "live wire", "gas smell", "open electrical box",
"burning", "flame", "smoke", "crack", "fissure"
],
"high": [
"accident", "injury", "broken", "bleeding", "hazard", "risk",
"dangerous", "unsafe", "threat", "pollution", "smoke",
"sewage", "overflow", "contamination", "infection", "disease",
"mosquito", "dengue", "malaria", "typhoid", "cholera",
"rat", "snake", "stray dog", "bite", "attack", "cattle",
"theft", "robbery", "burglary", "harassment", "abuse",
"illegal", "crime", "violation", "bribe", "corruption",
"traffic jam", "congestion", "gridlock", "delay",
"no water", "power cut", "blackout", "load shedding",
"pothole", "manhole", "open drain", "water logging",
"dead", "animal", "fish", "stuck",
"not working", "signal", "traffic light", "fallen tree",
"water leakage", "leakage", "burst", "pipe burst", "damage",
"leaning", "tilted", "unstable", "waterlogging"
],
"medium": [
"garbage", "trash", "waste", "litter", "rubbish", "dustbin",
"smell", "odor", "stink", "foul", "dirty", "unclean",
"messy", "ugly", "eyesore", "bad", "poor",
"leak", "drip", "seepage", "moisture", "damp",
"noise", "loud", "sound", "music", "party", "barking",
"encroachment", "hawker", "vendor", "stall", "shop",
"parking", "parked", "vehicle", "car", "bike", "scooter",
"construction", "debris", "material", "sand", "cement",
"graffiti", "poster", "banner", "hoarding", "advertisement",
"slippery", "muddy", "path", "pavement", "sidewalk",
"crowd", "gathering", "tap", "wasting", "running water",
"speed breaker", "hump", "bump"
],
"low": [
"light", "lamp", "bulb", "flicker", "dim", "dark",
"sign", "board", "paint", "color", "faded",
"bench", "chair", "seat", "grass", "plant", "tree",
"leaf", "branch", "garden", "park", "playground",
"cosmetic", "look", "appearance", "aesthetic",
"old", "rusty", "dirty", "leaning"
]
}
# We now access weights dynamically from the singleton adaptive_weights instance
pass
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty init method with only a pass statement and a comment is unnecessary. If the PriorityEngine no longer needs initialization logic, the init method can be removed entirely. Python will provide a default init automatically.

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +161
# Factor 1: Engagement (More issues is good? Or bad? Prompt says "Civic Intelligence Index"
# usually implies the SYSTEM's intelligence or CITY's performance.
# "Top Emerging Concern: Water Supply" implies monitoring.
# Let's assume High Index = Healthy System/City.

# High resolution rate -> +
resolved_count = sum(1 for g in grievances if g.status == GrievanceStatus.RESOLVED)
total_grievances = len(grievances)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 154 says "More issues is good? Or bad?" indicating uncertainty about the metric design. The final implementation doesn't factor in the total number of issues at all - it only considers resolution rate and severity distribution. This means the index could be the same whether there are 10 issues or 10,000 issues in a day, which may not accurately reflect civic health. Consider clarifying the metric definition or adding issue volume as a factor.

Copilot uses AI. Check for mistakes.
- Updated `backend/adaptive_weights.py` and `backend/civic_intelligence.py` to use absolute paths for `data/` directory relative to the project root.
- This ensures compatibility with Render's file system layout and persistent disk mounts at `./data`.
- Updated `.gitignore` to exclude `data/dailySnapshots/` and `data/modelWeights.json` as they are runtime generated data.
- Verified scheduler script execution with database initialization.
- Cleaned up local test artifacts.
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 6 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/trend_analyzer.py">

<violation number="1" location="backend/trend_analyzer.py:79">
P2: Bug: Truthiness check on coordinates incorrectly filters out `0.0` values. `0.0` is falsy in Python but is a valid coordinate. Use `is not None` to match the pattern used in `spatial_utils.py`.</violation>
</file>

<file name="backend/adaptive_weights.py">

<violation number="1" location="backend/adaptive_weights.py:107">
P1: Shallow copy of nested dicts/lists causes mutation of module-level defaults. `dict.copy()` only copies the top-level dict; the inner lists (e.g., `DEFAULT_SEVERITY_KEYWORDS["critical"]`) are still shared references. When `update_severity_keyword` or `update_category_keyword` calls `keywords.append(...)` or `keywords.remove(...)`, it mutates the global defaults, corrupting them for the rest of the process. Use `copy.deepcopy()` instead.</violation>
</file>

<file name="backend/data/modelWeights.json">

<violation number="1" location="backend/data/modelWeights.json:51">
P1: Duplicate keywords across severity levels: `"smoke"` and `"attack"` appear in both `critical` and `high` lists. Since `priority_engine.py` checks `critical` first, the `high` entries are dead — these keywords will always resolve to critical severity. If that's intended, remove the duplicates from `high` to avoid confusion; if not (e.g., "smoke" from air pollution should be high, not critical), the lists need to be disambiguated.</violation>

<violation number="2" location="backend/data/modelWeights.json:217">
P2: Duplicate keywords across severity levels: `"leaning"` appears in both `high` and `low`, and `"dirty"` appears in both `medium` and `low`. The entries in the `low` list are unreachable dead data since higher severity levels are checked first. Remove the duplicates from `low` to keep the configuration clean and avoid confusion.</violation>
</file>

<file name="backend/civic_intelligence.py">

<violation number="1" location="backend/civic_intelligence.py:87">
P1: Status comparison `i.status == 'duplicate'` will never match because 'duplicate' is not a valid Issue status value. Valid statuses are: OPEN, VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED. This makes the `_adjust_duplicate_threshold` function a permanent no-op since `duplicates` count will always be 0.</violation>
</file>

<file name="backend/scheduler/daily_refinement_job.py">

<violation number="1" location="backend/scheduler/daily_refinement_job.py:30">
P2: Scheduled job script swallows exceptions and always exits with code 0, making failures invisible to external schedulers. Consider re-raising or calling `sys.exit(1)` in the `except` block so that cron, systemd, or other orchestrators can detect and alert on failures.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


def __init__(self, data_path: str = "backend/data/modelWeights.json"):
self.data_path = data_path
self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy()
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Shallow copy of nested dicts/lists causes mutation of module-level defaults. dict.copy() only copies the top-level dict; the inner lists (e.g., DEFAULT_SEVERITY_KEYWORDS["critical"]) are still shared references. When update_severity_keyword or update_category_keyword calls keywords.append(...) or keywords.remove(...), it mutates the global defaults, corrupting them for the rest of the process. Use copy.deepcopy() instead.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/adaptive_weights.py, line 107:

<comment>Shallow copy of nested dicts/lists causes mutation of module-level defaults. `dict.copy()` only copies the top-level dict; the inner lists (e.g., `DEFAULT_SEVERITY_KEYWORDS["critical"]`) are still shared references. When `update_severity_keyword` or `update_category_keyword` calls `keywords.append(...)` or `keywords.remove(...)`, it mutates the global defaults, corrupting them for the rest of the process. Use `copy.deepcopy()` instead.</comment>

<file context>
@@ -0,0 +1,239 @@
+
+    def __init__(self, data_path: str = "backend/data/modelWeights.json"):
+        self.data_path = data_path
+        self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy()
+        self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy()
+        self._categories = DEFAULT_CATEGORIES.copy()
</file context>
Fix with Cubic

"open electrical box",
"burning",
"flame",
"smoke",
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Duplicate keywords across severity levels: "smoke" and "attack" appear in both critical and high lists. Since priority_engine.py checks critical first, the high entries are dead — these keywords will always resolve to critical severity. If that's intended, remove the duplicates from high to avoid confusion; if not (e.g., "smoke" from air pollution should be high, not critical), the lists need to be disambiguated.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/data/modelWeights.json, line 51:

<comment>Duplicate keywords across severity levels: `"smoke"` and `"attack"` appear in both `critical` and `high` lists. Since `priority_engine.py` checks `critical` first, the `high` entries are dead — these keywords will always resolve to critical severity. If that's intended, remove the duplicates from `high` to avoid confusion; if not (e.g., "smoke" from air pollution should be high, not critical), the lists need to be disambiguated.</comment>

<file context>
@@ -0,0 +1,453 @@
+            "open electrical box",
+            "burning",
+            "flame",
+            "smoke",
+            "crack",
+            "fissure"
</file context>
Fix with Cubic

return

# Assuming 'duplicate' status indicates a duplicate issue
duplicates = len([i for i in issues if i.status == 'duplicate'])
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Status comparison i.status == 'duplicate' will never match because 'duplicate' is not a valid Issue status value. Valid statuses are: OPEN, VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED. This makes the _adjust_duplicate_threshold function a permanent no-op since duplicates count will always be 0.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/civic_intelligence.py, line 87:

<comment>Status comparison `i.status == 'duplicate'` will never match because 'duplicate' is not a valid Issue status value. Valid statuses are: OPEN, VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED. This makes the `_adjust_duplicate_threshold` function a permanent no-op since `duplicates` count will always be 0.</comment>

<file context>
@@ -0,0 +1,217 @@
+            return
+
+        # Assuming 'duplicate' status indicates a duplicate issue
+        duplicates = len([i for i in issues if i.status == 'duplicate'])
+        rate = duplicates / total
+
</file context>
Fix with Cubic

Uses spatial utilities to find clusters of issues.
"""
# Filter issues with location
located_issues = [i for i in issues if i.latitude and i.longitude]
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Bug: Truthiness check on coordinates incorrectly filters out 0.0 values. 0.0 is falsy in Python but is a valid coordinate. Use is not None to match the pattern used in spatial_utils.py.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/trend_analyzer.py, line 79:

<comment>Bug: Truthiness check on coordinates incorrectly filters out `0.0` values. `0.0` is falsy in Python but is a valid coordinate. Use `is not None` to match the pattern used in `spatial_utils.py`.</comment>

<file context>
@@ -0,0 +1,114 @@
+        Uses spatial utilities to find clusters of issues.
+        """
+        # Filter issues with location
+        located_issues = [i for i in issues if i.latitude and i.longitude]
+
+        if not located_issues:
</file context>
Fix with Cubic

"old",
"rusty",
"dirty",
"leaning"
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Duplicate keywords across severity levels: "leaning" appears in both high and low, and "dirty" appears in both medium and low. The entries in the low list are unreachable dead data since higher severity levels are checked first. Remove the duplicates from low to keep the configuration clean and avoid confusion.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/data/modelWeights.json, line 217:

<comment>Duplicate keywords across severity levels: `"leaning"` appears in both `high` and `low`, and `"dirty"` appears in both `medium` and `low`. The entries in the `low` list are unreachable dead data since higher severity levels are checked first. Remove the duplicates from `low` to keep the configuration clean and avoid confusion.</comment>

<file context>
@@ -0,0 +1,453 @@
+            "old",
+            "rusty",
+            "dirty",
+            "leaning"
+        ]
+    },
</file context>
Fix with Cubic

result = civic_intelligence_engine.refine_daily(db)
logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}")
except Exception as e:
logger.error(f"Daily refinement job failed: {e}", exc_info=True)
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Scheduled job script swallows exceptions and always exits with code 0, making failures invisible to external schedulers. Consider re-raising or calling sys.exit(1) in the except block so that cron, systemd, or other orchestrators can detect and alert on failures.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/scheduler/daily_refinement_job.py, line 30:

<comment>Scheduled job script swallows exceptions and always exits with code 0, making failures invisible to external schedulers. Consider re-raising or calling `sys.exit(1)` in the `except` block so that cron, systemd, or other orchestrators can detect and alert on failures.</comment>

<file context>
@@ -0,0 +1,35 @@
+        result = civic_intelligence_engine.refine_daily(db)
+        logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}")
+    except Exception as e:
+        logger.error(f"Daily refinement job failed: {e}", exc_info=True)
+    finally:
+        db.close()
</file context>
Fix with Cubic

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
backend/adaptive_weights.py (2)

180-198: Properties expose mutable internal state — callers can silently corrupt weights without persisting.

All four property accessors return direct references to the internal dicts/lists. Any caller that modifies the returned object (e.g., adaptive_weights.severity_keywords["critical"].append(...)) bypasses save_weights(), leaving the in-memory and on-disk states diverged. Consider returning deep copies from the properties, or documenting that callers must use the mutation methods.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/adaptive_weights.py` around lines 180 - 198, The four property
getters (severity_keywords, urgency_patterns, categories,
duplicate_search_radius) expose internal mutable state; update them to return
deep copies for the dict/list types (use copy.deepcopy on
self._severity_keywords, self._urgency_patterns, self._categories) so callers
cannot mutate in-memory state without going through the existing mutation/save
path (e.g., save_weights()); keep duplicate_search_radius as is (primitive) or
return a copied primitive, and ensure callers are documented to use provided
mutation methods that call save_weights().

137-138: Use logger.exception() instead of logger.error() in except blocks to preserve tracebacks.

In all three exception handlers (lines 138, 154, 178), logger.error(...) swallows the traceback. logger.exception(...) automatically attaches it, which is valuable for debugging file I/O failures in production.

Also applies to: 153-154, 177-178

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/adaptive_weights.py` around lines 137 - 138, Replace the three places
in backend/adaptive_weights.py where you call logger.error(...) inside except
blocks with logger.exception(...) so the traceback is preserved; specifically
update the except handlers that currently do logger.error(f"Error checking
weights file modification: {e}") and the two other file I/O/weights-related
except blocks to use logger.exception with the same contextual message (keep the
descriptive text but remove manual inclusion of the exception object since
logger.exception already records it).
backend/civic_intelligence.py (3)

199-203: Use logger.exception to preserve the traceback on snapshot save failure.

         except Exception as e:
-            logger.error(f"Failed to save snapshot: {e}")
+            logger.exception(f"Failed to save snapshot: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/civic_intelligence.py` around lines 199 - 203, The except block that
catches errors while writing the snapshot currently calls logger.error and loses
the traceback; change the error logging to logger.exception to preserve the full
traceback and context (in the except Exception as e: block that handles
open(filename, 'w') and json.dump(snapshot, f)), keeping the existing message
text (e.g., "Failed to save snapshot: {e}") so the filename/snapshot variables
remain referenced and the stack trace is recorded.

193-193: Use os.path.join instead of f-string path concatenation.

-        filename = f"{SNAPSHOT_DIR}/{snapshot['date']}.json"
+        filename = os.path.join(SNAPSHOT_DIR, f"{snapshot['date']}.json")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/civic_intelligence.py` at line 193, Replace the f-string path
concatenation when building filename with a platform-safe join: use os.path.join
to combine SNAPSHOT_DIR and snapshot['date'] + '.json' instead of
f"{SNAPSHOT_DIR}/{snapshot['date']}.json"; ensure the module imports os if not
already present and update the assignment to filename to use
os.path.join(SNAPSHOT_DIR, snapshot['date'] + '.json') so SNAPSHOT_DIR and
snapshot['date'] are combined correctly across OSes.

139-144: Auto-learning keywords are only ever added, never pruned — risk of keyword inflation over time.

The daily refinement adds keywords to the "critical" severity level but never removes stale ones. Over many cycles, the critical keyword list will grow unboundedly, diluting its signal. Consider adding a decay or pruning mechanism.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/civic_intelligence.py` around lines 139 - 144, The loop only ever
adds keywords (top_keywords -> adaptive_weights.update_severity_keyword(...,
add=True)) so critical keywords will inflate; implement a pruning/decay step
that removes stale keywords from current_critical_keywords before or after
adding: track recent frequency or last_seen for each keyword (e.g., via
adaptive_weights store or a metadata map), and call
adaptive_weights.update_severity_keyword("critical", keyword, add=False) for any
keyword whose count/recency falls below a configured threshold or age (e.g., not
seen in N days or frequency < M). Update or extend adaptive_weights to expose
removal and/or decay APIs and run the pruning logic adjacent to the existing
top_keywords loop (use the same symbols: top_keywords,
current_critical_keywords, adaptive_weights.update_severity_keyword).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/adaptive_weights.py`:
- Around line 238-242: get_urgency_patterns_tuples can raise IndexError when a
pattern tuple has fewer than 2 elements; update it (or the urgency_patterns
loader) to validate structure before accessing p[1]. Specifically, in
get_urgency_patterns_tuples check each pattern from self.urgency_patterns for at
least two elements and correct types (e.g., str and int), skip or coerce
malformed entries (and optionally log or collect warnings) so the returned list
only contains safe (pattern, weight) tuples; reference the
get_urgency_patterns_tuples method and the urgency_patterns property when
implementing the guard/validation.
- Around line 124-157: AdaptiveWeights has race conditions during concurrent
HTTP handling; add a threading.Lock instance (e.g., self._lock) to the
AdaptiveWeights singleton and use it to guard all file I/O and state mutations:
acquire the lock at the start of _check_reload, load_weights, and save_weights
(and any other methods that read/write self._severity_keywords,
self._urgency_patterns, self._categories, self._duplicate_search_radius, or
self.last_modified), perform file existence checks, reads/writes and updates
while holding the lock, and release the lock afterwards to prevent TOCTOU and
partial read/write races.

In `@backend/civic_intelligence.py`:
- Line 134: The code is calling the private method
trend_analyzer._extract_top_keywords, breaking encapsulation; update the
TrendAnalyzer class to expose a public API (e.g., add a method named
extract_top_keywords on TrendAnalyzer that wraps the existing
_extract_top_keywords logic or rename the method to extract_top_keywords) and
then replace the call in this module to use
trend_analyzer.extract_top_keywords(critical_issues, top_n=10) so the consumer
uses the public method instead of the private one.

---

Duplicate comments:
In `@backend/adaptive_weights.py`:
- Around line 111-122: The __init__ currently uses shallow .copy() for
DEFAULT_SEVERITY_KEYWORDS, DEFAULT_URGENCY_PATTERNS, and DEFAULT_CATEGORIES
which leaves inner lists/dicts shared; change these allocations in
AdaptiveWeights.__init__ to use copy.deepcopy on those DEFAULT_* constants so
_severity_keywords, _urgency_patterns, and _categories get independent deep
copies and in-place mutations (e.g., update_severity_keyword) won't mutate
module-level defaults.

In `@backend/civic_intelligence.py`:
- Around line 47-51: historical_counts is currently pulled from only the last
snapshot via _load_last_snapshot and passed to trend_analyzer.analyze (which
calls _detect_spikes) but that treats a single day as the historical average;
change the code to load the last N snapshots (e.g., via a new or extended method
on the same snapshot loader), compute per-category average counts across those
snapshots (handle missing categories and divide by the number of snapshots
used), and pass that averaged dict as historical_counts to
trend_analyzer.analyze so _detect_spikes compares against a multi-day baseline
rather than a single prior day.
- Around line 90-92: The duplicate detection is wrong because Issue.status never
equals 'duplicate' so duplicates is always zero; update the comprehension in
_adjust_duplicate_threshold to detect duplicates using the actual model fields
(e.g., count issues where getattr(i, 'duplicate_of', None) is not None or
getattr(i, 'is_duplicate', False) or i.metadata.get('is_duplicate', False) as a
fallback) instead of i.status == 'duplicate', then recompute duplicates and rate
accordingly so the threshold adjustment can trigger.

---

Nitpick comments:
In `@backend/adaptive_weights.py`:
- Around line 180-198: The four property getters (severity_keywords,
urgency_patterns, categories, duplicate_search_radius) expose internal mutable
state; update them to return deep copies for the dict/list types (use
copy.deepcopy on self._severity_keywords, self._urgency_patterns,
self._categories) so callers cannot mutate in-memory state without going through
the existing mutation/save path (e.g., save_weights()); keep
duplicate_search_radius as is (primitive) or return a copied primitive, and
ensure callers are documented to use provided mutation methods that call
save_weights().
- Around line 137-138: Replace the three places in backend/adaptive_weights.py
where you call logger.error(...) inside except blocks with logger.exception(...)
so the traceback is preserved; specifically update the except handlers that
currently do logger.error(f"Error checking weights file modification: {e}") and
the two other file I/O/weights-related except blocks to use logger.exception
with the same contextual message (keep the descriptive text but remove manual
inclusion of the exception object since logger.exception already records it).

In `@backend/civic_intelligence.py`:
- Around line 199-203: The except block that catches errors while writing the
snapshot currently calls logger.error and loses the traceback; change the error
logging to logger.exception to preserve the full traceback and context (in the
except Exception as e: block that handles open(filename, 'w') and
json.dump(snapshot, f)), keeping the existing message text (e.g., "Failed to
save snapshot: {e}") so the filename/snapshot variables remain referenced and
the stack trace is recorded.
- Line 193: Replace the f-string path concatenation when building filename with
a platform-safe join: use os.path.join to combine SNAPSHOT_DIR and
snapshot['date'] + '.json' instead of f"{SNAPSHOT_DIR}/{snapshot['date']}.json";
ensure the module imports os if not already present and update the assignment to
filename to use os.path.join(SNAPSHOT_DIR, snapshot['date'] + '.json') so
SNAPSHOT_DIR and snapshot['date'] are combined correctly across OSes.
- Around line 139-144: The loop only ever adds keywords (top_keywords ->
adaptive_weights.update_severity_keyword(..., add=True)) so critical keywords
will inflate; implement a pruning/decay step that removes stale keywords from
current_critical_keywords before or after adding: track recent frequency or
last_seen for each keyword (e.g., via adaptive_weights store or a metadata map),
and call adaptive_weights.update_severity_keyword("critical", keyword,
add=False) for any keyword whose count/recency falls below a configured
threshold or age (e.g., not seen in N days or frequency < M). Update or extend
adaptive_weights to expose removal and/or decay APIs and run the pruning logic
adjacent to the existing top_keywords loop (use the same symbols: top_keywords,
current_critical_keywords, adaptive_weights.update_severity_keyword).

Comment on lines +124 to +157
def _check_reload(self):
"""Checks if file modified and reloads if needed."""
now = time.time()
if now - self.last_check_time < self.check_interval:
return

self.last_check_time = now
if os.path.exists(self.data_path):
try:
mtime = os.path.getmtime(self.data_path)
if mtime > self.last_modified:
logger.info(f"Weights file changed. Reloading from {self.data_path}")
self.load_weights()
except Exception as e:
logger.error(f"Error checking weights file modification: {e}")

def load_weights(self):
"""Loads weights from the JSON file if it exists."""
if os.path.exists(self.data_path):
try:
with open(self.data_path, 'r') as f:
data = json.load(f)
self._severity_keywords = data.get("severity_keywords", self._severity_keywords)
self._urgency_patterns = data.get("urgency_patterns", self._urgency_patterns)
self._categories = data.get("categories", self._categories)
self._duplicate_search_radius = data.get("duplicate_search_radius", self._duplicate_search_radius)

self.last_modified = os.path.getmtime(self.data_path)
logger.info(f"Loaded weights from {self.data_path}")
except Exception as e:
logger.error(f"Failed to load weights from {self.data_path}: {e}")
else:
logger.info(f"No existing weights file found at {self.data_path}. Using defaults.")
self.save_weights()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for scheduler, threading, and APScheduler usage
rg -n "daily_refinement|scheduler|APScheduler|Thread|BackgroundScheduler|SchedulerJob" --type=py -C 3

Repository: RohanExploit/VishwaGuru

Length of output: 14894


🏁 Script executed:

# Find where adaptive_weights is instantiated and used
rg -n "adaptive_weights|AdaptiveWeights" --type=py -C 2

Repository: RohanExploit/VishwaGuru

Length of output: 4907


🏁 Script executed:

# Check for any existing threading imports or locks in the codebase
rg -n "threading|Lock|RLock|Semaphore" --type=py

Repository: RohanExploit/VishwaGuru

Length of output: 2284


🏁 Script executed:

# Look at the main entry points and how the app is structured
fd -e "py" -E "__pycache__" | xargs grep -l "app\|main\|run\|serve" | head -20

Repository: RohanExploit/VishwaGuru

Length of output: 558


🏁 Script executed:

# Check the scheduler job implementation
cat -n backend/scheduler/daily_refinement_job.py

Repository: RohanExploit/VishwaGuru

Length of output: 1389


🏁 Script executed:

# Check main entry point to understand app structure
cat -n backend/__main__.py

Repository: RohanExploit/VishwaGuru

Length of output: 1287


🏁 Script executed:

# Check if there's any APScheduler or similar background task integration
rg -n "APScheduler|schedule\|cron\|background" --type=py backend/

Repository: RohanExploit/VishwaGuru

Length of output: 49


🏁 Script executed:

# Verify adaptive_weights is accessed from request handlers
rg -n "adaptive_weights\|priority_engine" backend/routers/ backend/main.py --type=py -B 2 -A 2

Repository: RohanExploit/VishwaGuru

Length of output: 49


🏁 Script executed:

# Check if there are any existing lock patterns in adaptive_weights
cat -n backend/adaptive_weights.py | head -120

Repository: RohanExploit/VishwaGuru

Length of output: 7179


Add thread safety to AdaptiveWeights singleton for concurrent HTTP request handling.

FastAPI uses a thread pool to handle concurrent HTTP requests in the same process. Multiple request handlers call priority_engineadaptive_weights concurrently, creating race conditions in _check_reload(), load_weights(), and save_weights(). Concurrent file I/O without locking can cause partial reads (TOCTOU races), lost writes, or inconsistent state. This pattern is already applied throughout the codebase (pothole_detection.py, garbage_detection.py, local_ml_service.py all use threading.Lock). Add a threading.Lock around the read/write paths in AdaptiveWeights.

🧰 Tools
🪛 Ruff (0.15.1)

[warning] 137-137: Do not catch blind exception: Exception

(BLE001)


[warning] 138-138: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


[warning] 153-153: Do not catch blind exception: Exception

(BLE001)


[warning] 154-154: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/adaptive_weights.py` around lines 124 - 157, AdaptiveWeights has race
conditions during concurrent HTTP handling; add a threading.Lock instance (e.g.,
self._lock) to the AdaptiveWeights singleton and use it to guard all file I/O
and state mutations: acquire the lock at the start of _check_reload,
load_weights, and save_weights (and any other methods that read/write
self._severity_keywords, self._urgency_patterns, self._categories,
self._duplicate_search_radius, or self.last_modified), perform file existence
checks, reads/writes and updates while holding the lock, and release the lock
afterwards to prevent TOCTOU and partial read/write races.

Comment on lines +238 to +242
def get_urgency_patterns_tuples(self) -> List[Tuple[str, int]]:
"""Returns urgency patterns as tuples (required by PriorityEngine)."""
# Call property to trigger reload check
patterns = self.urgency_patterns
return [(p[0], p[1]) for p in patterns]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

get_urgency_patterns_tuples will raise IndexError on malformed pattern entries.

If the JSON file contains a pattern with fewer than 2 elements (e.g., ["some_regex"]), p[1] throws an IndexError. Consider adding a guard or validating pattern structure on load.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/adaptive_weights.py` around lines 238 - 242,
get_urgency_patterns_tuples can raise IndexError when a pattern tuple has fewer
than 2 elements; update it (or the urgency_patterns loader) to validate
structure before accessing p[1]. Specifically, in get_urgency_patterns_tuples
check each pattern from self.urgency_patterns for at least two elements and
correct types (e.g., str and int), skip or coerce malformed entries (and
optionally log or collect warnings) so the returned list only contains safe
(pattern, weight) tuples; reference the get_urgency_patterns_tuples method and
the urgency_patterns property when implementing the guard/validation.


# Analyze text for common keywords in these critical issues
# We can reuse TrendAnalyzer logic but specific to this subset
top_keywords = trend_analyzer._extract_top_keywords(critical_issues, top_n=10)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Accessing private method trend_analyzer._extract_top_keywords() breaks encapsulation.

_extract_top_keywords is a private method on TrendAnalyzer. Promote it to a public method (drop the underscore) or expose a dedicated public API for keyword extraction, so this module isn't coupled to an implementation detail.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/civic_intelligence.py` at line 134, The code is calling the private
method trend_analyzer._extract_top_keywords, breaking encapsulation; update the
TrendAnalyzer class to expose a public API (e.g., add a method named
extract_top_keywords on TrendAnalyzer that wraps the existing
_extract_top_keywords logic or rename the method to extract_top_keywords) and
then replace the call in this module to use
trend_analyzer.extract_top_keywords(critical_issues, top_n=10) so the consumer
uses the public method instead of the private one.

- Added `scikit-learn` and `numpy` to `backend/requirements-render.txt`.
- These dependencies are required by `backend/trend_analyzer.py` and `backend/spatial_utils.py` for clustering and trend detection functionality.
- This resolves import errors during the Render build/runtime process.
@github-actions
Copy link

🔍 Quality Reminder

Thanks for the updates! Please ensure:
- Your changes don't break existing functionality
- All tests still pass
- Code quality standards are maintained

*The maintainers will verify that the overall project flow remains intact.*

- Added `scikit-learn` and `numpy` to `backend/requirements-render.txt`.
- These dependencies are required by `backend/trend_analyzer.py` and `backend/spatial_utils.py` for clustering and trend detection functionality.
- This resolves import errors during the Render build/runtime process.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/requirements-render.txt (1)

18-19: Consider pinning scikit-learn and numpy versions for reproducibility.

While these are large dependencies that can introduce breaking changes, the entire requirements-render.txt file follows an unpinned approach. If your deployment process requires reproducible builds, consider pinning all dependencies. If you maintain a separate lock file (e.g., pip-compile or poetry.lock), this is already handled. For reference, latest stable versions are scikit-learn 1.8.0 and numpy 2.4.0 (both supporting Python 3.11–3.14).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/requirements-render.txt` around lines 18 - 19, The dependencies
scikit-learn and numpy are unpinned which harms reproducibility; update the
requirements to pin explicit versions for these packages (for example, pin
scikit-learn to 1.8.0 and numpy to 2.4.0 or use conservative range constraints)
or ensure a proper lockfile (pip-compile/poetry.lock) is produced and committed
so builds remain deterministic; target the entries referencing scikit-learn and
numpy in the requirements list and apply the chosen pinning strategy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/requirements-render.txt`:
- Around line 18-19: The dependencies scikit-learn and numpy are unpinned which
harms reproducibility; update the requirements to pin explicit versions for
these packages (for example, pin scikit-learn to 1.8.0 and numpy to 2.4.0 or use
conservative range constraints) or ensure a proper lockfile
(pip-compile/poetry.lock) is produced and committed so builds remain
deterministic; target the entries referencing scikit-learn and numpy in the
requirements list and apply the chosen pinning strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant