Daily Civic Intelligence Refinement Engine#413
Conversation
- Added `AdaptiveWeights` for dynamic severity/category configuration with hot-reloading. - Added `TrendAnalyzer` to detect keywords, category spikes, and geographic clusters. - Added `CivicIntelligenceEngine` to orchestrate daily analysis and generate snapshots. - Added duplicate threshold adjustment logic based on daily duplicate rates. - Updated `PriorityEngine` to use `AdaptiveWeights`. - Added scheduler script `backend/scheduler/daily_refinement_job.py`. - Added `backend/data/dailySnapshots/` directory.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
✅ Deploy Preview for fixmybharat canceled.
|
🙏 Thank you for your contribution, @RohanExploit!PR Details:
Quality Checklist:
Review Process:
Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken. |
📝 WalkthroughWalkthroughAdds an adaptive weight manager and JSON weight dataset, a CivicIntelligenceEngine that runs a daily refinement pipeline (trend analysis, weight optimization, duplicate threshold tuning, civic index calc, snapshotting), a TrendAnalyzer, scheduler entrypoint, and refactors PriorityEngine to read dynamic weights. Changes
Sequence DiagramsequenceDiagram
participant Scheduler
participant Engine as CivicIntelligenceEngine
participant DB as Database
participant Analyzer as TrendAnalyzer
participant Weights as AdaptiveWeights
participant Priority as PriorityEngine
Scheduler->>Engine: refine_daily(db)
Engine->>DB: fetch issues & grievances (last 24h)
Engine->>Engine: _load_last_snapshot()
Engine->>Analyzer: analyze(issues, historical_counts)
Analyzer-->>Engine: trends & clusters
Engine->>Weights: _optimize_weights(grievances)
Weights-->>Engine: weight_updates
Engine->>Weights: _adjust_duplicate_threshold(issues)
Engine->>Priority: compute severity/urgency (reads adaptive weights)
Engine->>Engine: _calculate_index(issues, grievances)
Engine->>Engine: _save_snapshot(snapshot)
Engine-->>Scheduler: return index_score & metadata
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested Labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (8)
backend/civic_intelligence.py (2)
195-199: Uselogger.exceptionfor richer tracebacks in error paths.Per the Ruff TRY400 hint,
logger.error(...)on line 199 won't include the full traceback by default.logger.exception(...)automatically attaches the traceback within anexceptblock. Same applies to_load_last_snapshot(line 214).Proposed fix
except Exception as e: - logger.error(f"Failed to save snapshot: {e}") + logger.exception(f"Failed to save snapshot: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/civic_intelligence.py` around lines 195 - 199, Replace the error logging calls in the snapshot persistence code to capture tracebacks: inside the except blocks in the functions handling snapshots (the write block that currently logs "Failed to save snapshot" and the corresponding _load_last_snapshot error path), call logger.exception(...) instead of logger.error(...) so the exception traceback is included while preserving the existing message text.
128-130: Calling private method_extract_top_keywordsfrom outsideTrendAnalyzer.
trend_analyzer._extract_top_keywords(...)reaches into a private method. Consider making this a public method (rename toextract_top_keywords) or adding a dedicated public API onTrendAnalyzerfor keyword extraction from an arbitrary subset of issues.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/civic_intelligence.py` around lines 128 - 130, The code calls the private method trend_analyzer._extract_top_keywords(critical_issues, top_n=10); change this to use a public API on TrendAnalyzer by either renaming the method to extract_top_keywords or adding a new public method (e.g., TrendAnalyzer.extract_top_keywords(issues, top_n)) that accepts an arbitrary list like critical_issues and returns top_keywords; update the call site to use the new public method and remove any direct references to _extract_top_keywords to respect encapsulation.backend/data/modelWeights.json (1)
2-218: Duplicate keywords across severity tiers create dead entries.Several keywords appear in multiple severity tiers — e.g.,
"smoke"in bothcritical(Line 51) andhigh(Line 66),"attack"in bothcritical(Line 19) andhigh(Line 81),"dirty"inmediumandlow,"leaning"inhighandlow. Because_calculate_severityin PriorityEngine short-circuits at the highest matching tier, duplicates in lower tiers will never influence scoring. They add maintenance overhead and can mislead anyone editing the weights.Consider deduplicating across tiers so each keyword appears only at its intended severity level.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/data/modelWeights.json` around lines 2 - 218, The severity_keywords mapping contains duplicate entries across tiers which never get used because PriorityEngine._calculate_severity short-circuits at the highest matching tier; remove duplicates so each keyword appears only once at its intended severity level (edit the "severity_keywords" object in modelWeights.json), audit for repeated tokens like "smoke", "attack", "dirty", "leaning", and ensure each term is assigned to the correct tier to avoid misleading maintainers and dead entries used by PriorityEngine._calculate_severity.backend/adaptive_weights.py (3)
118-132:_check_reloadis not thread-safe — concurrent reads and writes can interleave.The hot-reload mechanism (
_check_reload→load_weights) can fire during a web request while the scheduler is simultaneously callingsave_weightsorupdate_severity_keyword. There is no locking around the read-modify-write sequences. Under the GIL this won't segfault, but aPriorityEngine.analyze()call could see partially updated state (e.g., new severity keywords but old urgency patterns from a stale load).If the API server uses threads (e.g., gunicorn with
--threads), consider adding athreading.Lockaround load/save/property access.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/adaptive_weights.py` around lines 118 - 132, The _check_reload/load_weights path is not thread-safe: add a threading.Lock (e.g., self._rw_lock) to adaptive_weights.py and use it to serialize access to shared state—acquire the lock around load_weights(), save_weights(), update_severity_keyword(), and any property getters or methods used by PriorityEngine.analyze() that read internal structures; ensure _check_reload acquires the lock before checking mtime and performing load to prevent interleaved read/write, and release the lock promptly to avoid blocking.
131-132: Uselogger.exceptioninstead oflogger.errorinexceptblocks.Per Ruff TRY400,
logger.error(f"...: {e}")loses the full traceback.logger.exception(...)automatically includes it, which is valuable for diagnosing file I/O failures in production.Proposed fix (example for line 132)
except Exception as e: - logger.error(f"Error checking weights file modification: {e}") + logger.exception(f"Error checking weights file modification: {e}")Apply the same pattern at lines 148 and 172.
Also applies to: 147-148, 171-172
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/adaptive_weights.py` around lines 131 - 132, Replace the logger.error calls inside the three except Exception as e blocks with logger.exception so the full traceback is recorded; locate the three try/except handlers in backend/adaptive_weights.py that currently do `except Exception as e: logger.error(f"Error ...: {e}")` and change them to `logger.exception("Error ...")` (keep the original descriptive message) so the logger variable used in this module captures the stack trace for those file I/O/weight-check failures.
174-197: Properties return mutable internal state — callers can accidentally corrupt weights.
severity_keywords,urgency_patterns, andcategoriesproperties all return direct references to internal lists/dicts. Any caller doing e.g.adaptive_weights.severity_keywords["critical"].append("test")would silently mutate the weights without triggeringsave_weights(), creating a divergence between in-memory and on-disk state.Consider returning copies from the property getters, or documenting that callers must treat the returned data as read-only.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/adaptive_weights.py` around lines 174 - 197, The property getters severity_keywords, urgency_patterns, and categories currently return direct references to internal mutable state (_severity_keywords, _urgency_patterns, _categories) so callers can mutate in-memory weights without triggering save_weights; change each getter to return a defensive copy (deep copy for nested lists/dicts) after calling _check_reload() so external mutations don't affect internal state, keeping the existing _check_reload() call; no change to duplicate_search_radius behavior is needed aside from leaving its setter that calls save_weights() as-is.backend/trend_analyzer.py (2)
22-22: Use explicitOptionalfor the default-Noneparameter.Per PEP 484 (and flagged by Ruff RUF013),
Dict[str, float] = Noneis an implicitOptional. Use the explicit form.Proposed fix
- def analyze(self, issues: List[Issue], historical_category_counts: Dict[str, float] = None) -> Dict[str, Any]: + def analyze(self, issues: List[Issue], historical_category_counts: Optional[Dict[str, float]] = None) -> Dict[str, Any]:And add
Optionalto the import on line 2:-from typing import List, Dict, Any, Tuple +from typing import List, Dict, Any, Tuple, Optional🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/trend_analyzer.py` at line 22, Update the analyze method signature to use an explicit Optional for the default-None parameter: change the type of historical_category_counts from Dict[str, float] = None to Optional[Dict[str, float]] = None in the analyze method definition, and add Optional to the type imports at the top of the file so the symbol Optional is available for use.
46-52: Keyword extraction is limited to Latin[a-z]characters only.The regex
r'\b[a-z]{3,}\b'will silently drop all non-Latin tokens (Hindi, Tamil, etc.). Given a civic platform likely serving diverse populations, this could miss the majority of keywords. Worth noting as a known limitation and addressing in a follow-up if multilingual support is expected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/trend_analyzer.py` around lines 46 - 52, The regex in _extract_top_keywords currently uses r'\b[a-z]{3,}\b' which only matches Latin a–z and drops non-Latin tokens; change the extraction to use a Unicode-aware pattern (e.g., match word characters or Unicode letters) and pass re.UNICODE, and normalize case with .casefold() instead of .lower() so tokens like Hindi/Tamil are retained and compared consistently; keep filtering via STOP_WORDS and Counter(filtered_words).most_common(top_n), and consider that STOP_WORDS may need localization in a future follow-up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/adaptive_weights.py`:
- Around line 105-116: The constructor (__init__) currently uses shallow copies
(DEFAULT_SEVERITY_KEYWORDS.copy(), DEFAULT_URGENCY_PATTERNS.copy(),
DEFAULT_CATEGORIES.copy()) which leaves nested mutable objects shared; replace
these with deep copies so mutations to self._severity_keywords,
self._urgency_patterns, and self._categories don’t mutate the module-level
DEFAULT_* constants — import deepcopy from the copy module and initialize those
three attributes with deepcopy(DEFAULT_SEVERITY_KEYWORDS),
deepcopy(DEFAULT_URGENCY_PATTERNS), and deepcopy(DEFAULT_CATEGORIES)
respectively, leaving the rest of __init__ (last_modified, last_check_time,
check_interval, load_weights()) unchanged.
In `@backend/civic_intelligence.py`:
- Line 17: SNAPSHOT_DIR and the AdaptiveWeights data_path are currently relative
and can break when the process runs from a different CWD; update them to use an
absolute path by resolving relative paths against the module file location
(using __file__) or by reading a configurable environment variable (e.g.,
DATA_DIR) with a fallback to the anchored path; change the SNAPSHOT_DIR
definition and the AdaptiveWeights constructor/attribute that sets data_path to
compute os.path.join(os.path.dirname(__file__), ...) or
os.environ.get("DATA_DIR", ...anchored path...) so all file reads/writes use a
deterministic absolute path.
- Around line 43-47: The code currently passes the previous day's raw
category_counts (from previous_snapshot returned by _load_last_snapshot()) into
trend_analyzer.analyze(), which causes _detect_spikes to compare today against a
single day; instead, change the flow to load the last N snapshots (e.g., 7) and
compute per-category rolling averages before calling
trend_analyzer.analyze(issues_24h, historical_averages). Concretely: replace the
single-call to self._load_last_snapshot() with a call that retrieves the last N
snapshots (implement or reuse a helper like _load_last_n_snapshots(n) or extend
_load_last_snapshot to accept N), compute the mean category_counts across those
snapshots into historical_averages, and pass that dictionary to
trend_analyzer.analyze so _detect_spikes receives true historical averages (make
N configurable, default 7).
- Around line 86-88: The code in _adjust_duplicate_threshold uses i.status ==
'duplicate' which never matches because Issue.status only contains OPEN,
VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED; update the duplicate detection to use
the actual field used for marking duplicates (e.g., if you have an
Issue.resolution or Issue.is_duplicate flag, change the check to i.resolution ==
'DUPLICATE' or i.is_duplicate) or, if you intend to make duplicate a status, add
'DUPLICATE' to the Issue.status enum/schema and add the necessary migration and
tests; update the reference in _adjust_duplicate_threshold (and any other places
using 'duplicate') to the chosen canonical representation so duplicates are
counted correctly.
In `@backend/scheduler/daily_refinement_job.py`:
- Around line 29-30: The except block currently logs the error with
logger.error(f"Daily refinement job failed: {e}", exc_info=True) but swallows
the exception, causing the process to exit with code 0; modify the handler to
terminate with a non-zero exit (e.g., call sys.exit(1)) or re-raise the
exception after logging so external schedulers detect failure—update the except
Exception as e block in daily_refinement_job.py (the block using logger.error
and the literal "Daily refinement job failed") to either call sys.exit(1) after
logging or simply raise the caught exception.
In `@backend/trend_analyzer.py`:
- Line 79: The filter that builds located_issues currently uses a falsy check
(i.latitude and i.longitude) which will drop valid 0.0 coordinates; update the
comprehension to explicitly check for non-None values (e.g., i.latitude is not
None and i.longitude is not None) so zero coordinates are retained—modify the
located_issues assignment in trend_analyzer.py to use the explicit None checks
against the Issue objects in the issues list.
---
Nitpick comments:
In `@backend/adaptive_weights.py`:
- Around line 118-132: The _check_reload/load_weights path is not thread-safe:
add a threading.Lock (e.g., self._rw_lock) to adaptive_weights.py and use it to
serialize access to shared state—acquire the lock around load_weights(),
save_weights(), update_severity_keyword(), and any property getters or methods
used by PriorityEngine.analyze() that read internal structures; ensure
_check_reload acquires the lock before checking mtime and performing load to
prevent interleaved read/write, and release the lock promptly to avoid blocking.
- Around line 131-132: Replace the logger.error calls inside the three except
Exception as e blocks with logger.exception so the full traceback is recorded;
locate the three try/except handlers in backend/adaptive_weights.py that
currently do `except Exception as e: logger.error(f"Error ...: {e}")` and change
them to `logger.exception("Error ...")` (keep the original descriptive message)
so the logger variable used in this module captures the stack trace for those
file I/O/weight-check failures.
- Around line 174-197: The property getters severity_keywords, urgency_patterns,
and categories currently return direct references to internal mutable state
(_severity_keywords, _urgency_patterns, _categories) so callers can mutate
in-memory weights without triggering save_weights; change each getter to return
a defensive copy (deep copy for nested lists/dicts) after calling
_check_reload() so external mutations don't affect internal state, keeping the
existing _check_reload() call; no change to duplicate_search_radius behavior is
needed aside from leaving its setter that calls save_weights() as-is.
In `@backend/civic_intelligence.py`:
- Around line 195-199: Replace the error logging calls in the snapshot
persistence code to capture tracebacks: inside the except blocks in the
functions handling snapshots (the write block that currently logs "Failed to
save snapshot" and the corresponding _load_last_snapshot error path), call
logger.exception(...) instead of logger.error(...) so the exception traceback is
included while preserving the existing message text.
- Around line 128-130: The code calls the private method
trend_analyzer._extract_top_keywords(critical_issues, top_n=10); change this to
use a public API on TrendAnalyzer by either renaming the method to
extract_top_keywords or adding a new public method (e.g.,
TrendAnalyzer.extract_top_keywords(issues, top_n)) that accepts an arbitrary
list like critical_issues and returns top_keywords; update the call site to use
the new public method and remove any direct references to _extract_top_keywords
to respect encapsulation.
In `@backend/data/modelWeights.json`:
- Around line 2-218: The severity_keywords mapping contains duplicate entries
across tiers which never get used because PriorityEngine._calculate_severity
short-circuits at the highest matching tier; remove duplicates so each keyword
appears only once at its intended severity level (edit the "severity_keywords"
object in modelWeights.json), audit for repeated tokens like "smoke", "attack",
"dirty", "leaning", and ensure each term is assigned to the correct tier to
avoid misleading maintainers and dead entries used by
PriorityEngine._calculate_severity.
In `@backend/trend_analyzer.py`:
- Line 22: Update the analyze method signature to use an explicit Optional for
the default-None parameter: change the type of historical_category_counts from
Dict[str, float] = None to Optional[Dict[str, float]] = None in the analyze
method definition, and add Optional to the type imports at the top of the file
so the symbol Optional is available for use.
- Around line 46-52: The regex in _extract_top_keywords currently uses
r'\b[a-z]{3,}\b' which only matches Latin a–z and drops non-Latin tokens; change
the extraction to use a Unicode-aware pattern (e.g., match word characters or
Unicode letters) and pass re.UNICODE, and normalize case with .casefold()
instead of .lower() so tokens like Hindi/Tamil are retained and compared
consistently; keep filtering via STOP_WORDS and
Counter(filtered_words).most_common(top_n), and consider that STOP_WORDS may
need localization in a future follow-up.
backend/adaptive_weights.py
Outdated
| def __init__(self, data_path: str = "backend/data/modelWeights.json"): | ||
| self.data_path = data_path | ||
| self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy() | ||
| self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy() | ||
| self._categories = DEFAULT_CATEGORIES.copy() | ||
| self._duplicate_search_radius = 50.0 | ||
|
|
||
| self.last_modified = 0.0 | ||
| self.last_check_time = 0.0 | ||
| self.check_interval = 5.0 # Check every 5 seconds at most | ||
|
|
||
| self.load_weights() |
There was a problem hiding this comment.
Shallow .copy() on defaults shares nested mutable objects — mutations corrupt the module-level constants.
DEFAULT_SEVERITY_KEYWORDS.copy() (line 107) creates a shallow copy: the outer dict is new, but the inner lists (e.g., DEFAULT_SEVERITY_KEYWORDS["critical"]) are shared references. When update_severity_keyword appends to self._severity_keywords[level], it mutates the original default list in-place. This corrupts DEFAULT_SEVERITY_KEYWORDS for the lifetime of the process.
The same applies to DEFAULT_URGENCY_PATTERNS (line 108) and DEFAULT_CATEGORIES (line 109).
In practice the JSON file usually exists and load_weights replaces these references, but the bug surfaces when the file is missing (first run / tests).
Proposed fix — use deep copies
+import copy
+
class AdaptiveWeights:
...
def __init__(self, data_path: str = "backend/data/modelWeights.json"):
self.data_path = data_path
- self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy()
- self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy()
- self._categories = DEFAULT_CATEGORIES.copy()
+ self._severity_keywords = copy.deepcopy(DEFAULT_SEVERITY_KEYWORDS)
+ self._urgency_patterns = copy.deepcopy(DEFAULT_URGENCY_PATTERNS)
+ self._categories = copy.deepcopy(DEFAULT_CATEGORIES)
self._duplicate_search_radius = 50.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self, data_path: str = "backend/data/modelWeights.json"): | |
| self.data_path = data_path | |
| self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy() | |
| self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy() | |
| self._categories = DEFAULT_CATEGORIES.copy() | |
| self._duplicate_search_radius = 50.0 | |
| self.last_modified = 0.0 | |
| self.last_check_time = 0.0 | |
| self.check_interval = 5.0 # Check every 5 seconds at most | |
| self.load_weights() | |
| def __init__(self, data_path: str = "backend/data/modelWeights.json"): | |
| self.data_path = data_path | |
| self._severity_keywords = copy.deepcopy(DEFAULT_SEVERITY_KEYWORDS) | |
| self._urgency_patterns = copy.deepcopy(DEFAULT_URGENCY_PATTERNS) | |
| self._categories = copy.deepcopy(DEFAULT_CATEGORIES) | |
| self._duplicate_search_radius = 50.0 | |
| self.last_modified = 0.0 | |
| self.last_check_time = 0.0 | |
| self.check_interval = 5.0 # Check every 5 seconds at most | |
| self.load_weights() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/adaptive_weights.py` around lines 105 - 116, The constructor
(__init__) currently uses shallow copies (DEFAULT_SEVERITY_KEYWORDS.copy(),
DEFAULT_URGENCY_PATTERNS.copy(), DEFAULT_CATEGORIES.copy()) which leaves nested
mutable objects shared; replace these with deep copies so mutations to
self._severity_keywords, self._urgency_patterns, and self._categories don’t
mutate the module-level DEFAULT_* constants — import deepcopy from the copy
module and initialize those three attributes with
deepcopy(DEFAULT_SEVERITY_KEYWORDS), deepcopy(DEFAULT_URGENCY_PATTERNS), and
deepcopy(DEFAULT_CATEGORIES) respectively, leaving the rest of __init__
(last_modified, last_check_time, check_interval, load_weights()) unchanged.
| # 2. Trend Analysis | ||
| previous_snapshot = self._load_last_snapshot() | ||
| historical_counts = previous_snapshot.get("trends", {}).get("category_counts", {}) if previous_snapshot else {} | ||
|
|
||
| trends = trend_analyzer.analyze(issues_24h, historical_counts) |
There was a problem hiding this comment.
Spike detection compares against a single previous day, not a true historical average.
historical_counts on line 45 is pulled from the previous day's snapshot category_counts, which are raw counts for that one day. However, _detect_spikes treats its second argument as historical_averages and applies a > 1.5× threshold. Comparing today against a single prior day is noisy — one anomalous day followed by a normal day would register as a "spike."
Consider computing a rolling average over the last N snapshots (e.g., 7 days) for more meaningful spike detection.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/civic_intelligence.py` around lines 43 - 47, The code currently
passes the previous day's raw category_counts (from previous_snapshot returned
by _load_last_snapshot()) into trend_analyzer.analyze(), which causes
_detect_spikes to compare today against a single day; instead, change the flow
to load the last N snapshots (e.g., 7) and compute per-category rolling averages
before calling trend_analyzer.analyze(issues_24h, historical_averages).
Concretely: replace the single-call to self._load_last_snapshot() with a call
that retrieves the last N snapshots (implement or reuse a helper like
_load_last_n_snapshots(n) or extend _load_last_snapshot to accept N), compute
the mean category_counts across those snapshots into historical_averages, and
pass that dictionary to trend_analyzer.analyze so _detect_spikes receives true
historical averages (make N configurable, default 7).
| # Assuming 'duplicate' status indicates a duplicate issue | ||
| duplicates = len([i for i in issues if i.status == 'duplicate']) | ||
| rate = duplicates / total |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the models file and examine Issue class
fd -t f "models.py" backend/ && cat backend/models.pyRepository: RohanExploit/VishwaGuru
Length of output: 8668
🏁 Script executed:
# Search for Issue class definition and status field
rg -n "class Issue" backend/ -A 20Repository: RohanExploit/VishwaGuru
Length of output: 9974
🏁 Script executed:
# Check for enum definitions like GrievanceStatus
rg -n "GrievanceStatus\|class.*Status.*Enum" backend/Repository: RohanExploit/VishwaGuru
Length of output: 49
🏁 Script executed:
# Read the civic_intelligence.py file focusing on the _adjust_duplicate_threshold function
cat -n backend/civic_intelligence.py | sed -n '75,100p'Repository: RohanExploit/VishwaGuru
Length of output: 1213
🏁 Script executed:
# Search for any references to 'duplicate' in the codebase
rg -n "duplicate" backend/ -iRepository: RohanExploit/VishwaGuru
Length of output: 2253
Fix status comparison — 'duplicate' is not a valid Issue status.
i.status == 'duplicate' will silently never match because the Issue.status field is a plain string column, but 'duplicate' is not a valid status value in the system. Valid statuses are: OPEN, VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED. This makes _adjust_duplicate_threshold a permanent no-op—the duplicates count will always be 0, and the threshold adjustment logic will never trigger.
Either add 'duplicate' as a valid status option or rethink how duplicate detection integrates with status tracking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/civic_intelligence.py` around lines 86 - 88, The code in
_adjust_duplicate_threshold uses i.status == 'duplicate' which never matches
because Issue.status only contains OPEN, VERIFIED, ASSIGNED, IN_PROGRESS,
RESOLVED; update the duplicate detection to use the actual field used for
marking duplicates (e.g., if you have an Issue.resolution or Issue.is_duplicate
flag, change the check to i.resolution == 'DUPLICATE' or i.is_duplicate) or, if
you intend to make duplicate a status, add 'DUPLICATE' to the Issue.status
enum/schema and add the necessary migration and tests; update the reference in
_adjust_duplicate_threshold (and any other places using 'duplicate') to the
chosen canonical representation so duplicates are counted correctly.
| except Exception as e: | ||
| logger.error(f"Daily refinement job failed: {e}", exc_info=True) |
There was a problem hiding this comment.
Swallowed exception means the process exits with code 0 on failure.
The except block logs the error but doesn't re-raise or call sys.exit(1). Any external scheduler (cron, systemd timer, Kubernetes CronJob) will see a successful exit and won't trigger failure alerts.
Proposed fix
except Exception as e:
logger.error(f"Daily refinement job failed: {e}", exc_info=True)
+ raise
finally:
db.close()Alternatively, if you want a clean exit code without a traceback on stderr, use sys.exit(1) after logging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/scheduler/daily_refinement_job.py` around lines 29 - 30, The except
block currently logs the error with logger.error(f"Daily refinement job failed:
{e}", exc_info=True) but swallows the exception, causing the process to exit
with code 0; modify the handler to terminate with a non-zero exit (e.g., call
sys.exit(1)) or re-raise the exception after logging so external schedulers
detect failure—update the except Exception as e block in daily_refinement_job.py
(the block using logger.error and the literal "Daily refinement job failed") to
either call sys.exit(1) after logging or simply raise the caught exception.
| Uses spatial utilities to find clusters of issues. | ||
| """ | ||
| # Filter issues with location | ||
| located_issues = [i for i in issues if i.latitude and i.longitude] |
There was a problem hiding this comment.
Falsy check on coordinates will exclude valid 0.0 values.
i.latitude and i.longitude evaluates to False when either coordinate is 0.0 (equator or prime meridian), which are valid geographic positions. The Issue model defines these as optional nullable floats, so the correct check is is not None.
Proposed fix
- located_issues = [i for i in issues if i.latitude and i.longitude]
+ located_issues = [i for i in issues if i.latitude is not None and i.longitude is not None]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| located_issues = [i for i in issues if i.latitude and i.longitude] | |
| located_issues = [i for i in issues if i.latitude is not None and i.longitude is not None] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/trend_analyzer.py` at line 79, The filter that builds located_issues
currently uses a falsy check (i.latitude and i.longitude) which will drop valid
0.0 coordinates; update the comprehension to explicitly check for non-None
values (e.g., i.latitude is not None and i.longitude is not None) so zero
coordinates are retained—modify the located_issues assignment in
trend_analyzer.py to use the explicit None checks against the Issue objects in
the issues list.
There was a problem hiding this comment.
Pull request overview
This PR implements a daily civic intelligence refinement engine that enables the VishwaGuru system to self-improve by analyzing patterns in civic issues and automatically adjusting classification weights and duplicate detection thresholds. The system introduces dynamic, hot-reloadable configuration to replace hardcoded priority engine weights.
Changes:
- Introduced AdaptiveWeights singleton for hot-reloading severity keywords, urgency patterns, categories, and duplicate detection radius from JSON
- Created TrendAnalyzer for extracting keywords, detecting category spikes, and identifying geographic clusters using DBSCAN
- Implemented CivicIntelligenceEngine to orchestrate daily refinement including Civic Intelligence Index calculation, automatic weight updates based on critical issues, and duplicate threshold adjustment
- Modified PriorityEngine to use dynamic weights from AdaptiveWeights instead of hardcoded dictionaries
- Added scheduler script for running daily refinement job
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/adaptive_weights.py | Implements hot-reloadable configuration management with file-watching and JSON persistence |
| backend/civic_intelligence.py | Core daily refinement logic including trend analysis, weight optimization, and index calculation |
| backend/trend_analyzer.py | Analyzes issues for keyword trends, category spikes, and geographic clusters |
| backend/priority_engine.py | Refactored to use dynamic weights from AdaptiveWeights singleton via properties |
| backend/scheduler/daily_refinement_job.py | Standalone script to execute daily refinement with database session management |
| backend/data/modelWeights.json | JSON configuration storing all dynamic weights and thresholds |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class AdaptiveWeights: | ||
| """ | ||
| Manages dynamic weights for the PriorityEngine. | ||
| Loads from/saves to a JSON file. | ||
| Automatically reloads if the file changes. | ||
| """ | ||
|
|
||
| def __init__(self, data_path: str = "backend/data/modelWeights.json"): | ||
| self.data_path = data_path | ||
| self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy() | ||
| self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy() | ||
| self._categories = DEFAULT_CATEGORIES.copy() | ||
| self._duplicate_search_radius = 50.0 | ||
|
|
||
| self.last_modified = 0.0 | ||
| self.last_check_time = 0.0 | ||
| self.check_interval = 5.0 # Check every 5 seconds at most | ||
|
|
||
| self.load_weights() | ||
|
|
||
| def _check_reload(self): | ||
| """Checks if file modified and reloads if needed.""" | ||
| now = time.time() | ||
| if now - self.last_check_time < self.check_interval: | ||
| return | ||
|
|
||
| self.last_check_time = now | ||
| if os.path.exists(self.data_path): | ||
| try: | ||
| mtime = os.path.getmtime(self.data_path) | ||
| if mtime > self.last_modified: | ||
| logger.info(f"Weights file changed. Reloading from {self.data_path}") | ||
| self.load_weights() | ||
| except Exception as e: | ||
| logger.error(f"Error checking weights file modification: {e}") | ||
|
|
||
| def load_weights(self): | ||
| """Loads weights from the JSON file if it exists.""" | ||
| if os.path.exists(self.data_path): | ||
| try: | ||
| with open(self.data_path, 'r') as f: | ||
| data = json.load(f) | ||
| self._severity_keywords = data.get("severity_keywords", self._severity_keywords) | ||
| self._urgency_patterns = data.get("urgency_patterns", self._urgency_patterns) | ||
| self._categories = data.get("categories", self._categories) | ||
| self._duplicate_search_radius = data.get("duplicate_search_radius", self._duplicate_search_radius) | ||
|
|
||
| self.last_modified = os.path.getmtime(self.data_path) | ||
| logger.info(f"Loaded weights from {self.data_path}") | ||
| except Exception as e: | ||
| logger.error(f"Failed to load weights from {self.data_path}: {e}") | ||
| else: | ||
| logger.info(f"No existing weights file found at {self.data_path}. Using defaults.") | ||
| self.save_weights() | ||
|
|
||
| def save_weights(self): | ||
| """Saves current weights to the JSON file.""" | ||
| data = { | ||
| "severity_keywords": self._severity_keywords, | ||
| "urgency_patterns": self._urgency_patterns, | ||
| "categories": self._categories, | ||
| "duplicate_search_radius": self._duplicate_search_radius | ||
| } | ||
| try: | ||
| # Ensure directory exists | ||
| os.makedirs(os.path.dirname(self.data_path), exist_ok=True) | ||
|
|
||
| with open(self.data_path, 'w') as f: | ||
| json.dump(data, f, indent=4) | ||
|
|
||
| # Update last_modified so we don't reload our own write immediately (or we do, which is fine) | ||
| self.last_modified = os.path.getmtime(self.data_path) | ||
| logger.info(f"Saved weights to {self.data_path}") | ||
| except Exception as e: | ||
| logger.error(f"Failed to save weights to {self.data_path}: {e}") | ||
|
|
||
| @property | ||
| def severity_keywords(self) -> Dict[str, List[str]]: | ||
| self._check_reload() | ||
| return self._severity_keywords | ||
|
|
||
| @property | ||
| def urgency_patterns(self) -> List[List[Any]]: | ||
| self._check_reload() | ||
| return self._urgency_patterns | ||
|
|
||
| @property | ||
| def categories(self) -> Dict[str, List[str]]: | ||
| self._check_reload() | ||
| return self._categories | ||
|
|
||
| @property | ||
| def duplicate_search_radius(self) -> float: | ||
| self._check_reload() | ||
| return self._duplicate_search_radius | ||
|
|
||
| @duplicate_search_radius.setter | ||
| def duplicate_search_radius(self, value: float): | ||
| self._duplicate_search_radius = value | ||
| self.save_weights() | ||
|
|
||
| def update_severity_keyword(self, severity_level: str, keyword: str, add: bool = True): | ||
| self._check_reload() # Ensure we have latest before updating | ||
| if severity_level not in self._severity_keywords: | ||
| return | ||
|
|
||
| keywords = self._severity_keywords[severity_level] | ||
| if add: | ||
| if keyword not in keywords: | ||
| keywords.append(keyword) | ||
| else: | ||
| if keyword in keywords: | ||
| keywords.remove(keyword) | ||
|
|
||
| self.save_weights() | ||
|
|
||
| def update_category_keyword(self, category: str, keyword: str, add: bool = True): | ||
| self._check_reload() | ||
| if category not in self._categories: | ||
| if add: | ||
| self._categories[category] = [keyword] | ||
| else: | ||
| return | ||
| else: | ||
| keywords = self._categories[category] | ||
| if add: | ||
| if keyword not in keywords: | ||
| keywords.append(keyword) | ||
| else: | ||
| if keyword in keywords: | ||
| keywords.remove(keyword) | ||
|
|
||
| self.save_weights() | ||
|
|
||
| def get_urgency_patterns_tuples(self) -> List[Tuple[str, int]]: | ||
| """Returns urgency patterns as tuples (required by PriorityEngine).""" | ||
| # Call property to trigger reload check | ||
| patterns = self.urgency_patterns | ||
| return [(p[0], p[1]) for p in patterns] | ||
|
|
||
| # Singleton instance | ||
| adaptive_weights = AdaptiveWeights() |
There was a problem hiding this comment.
Missing test coverage for the new adaptive_weights module. Given that other modules like priority_engine, spatial_utils, and various detectors have comprehensive test coverage in backend/tests/, the adaptive_weights module should have tests covering hot-reloading, file I/O, weight updates, and error conditions.
| def _load_last_snapshot(self) -> Dict[str, Any]: | ||
| """Loads the most recent snapshot.""" | ||
| if not os.path.exists(SNAPSHOT_DIR): | ||
| return {} |
There was a problem hiding this comment.
The snapshot directory is hardcoded as "backend/data/dailySnapshots" but there's no guarantee this directory exists. While _save_snapshot creates it on demand (line 192-193), _load_last_snapshot simply checks os.path.exists and returns an empty dict if it doesn't exist. This is inconsistent and could lead to unexpected behavior. Consider creating the directory on initialization or documenting that it needs to be created externally.
| return {} | |
| try: | |
| os.makedirs(SNAPSHOT_DIR, exist_ok=True) | |
| except OSError as e: | |
| logger.error(f"Failed to create snapshot directory '{SNAPSHOT_DIR}': {e}") | |
| return {} |
| # backend/scheduler/daily_refinement_job.py -> backend/scheduler -> backend -> root | ||
| repo_root = current_file.parent.parent.parent |
There was a problem hiding this comment.
The path manipulation assumes a specific directory structure (backend/scheduler/daily_refinement_job.py -> backend/scheduler -> backend -> root). If the script is symlinked or the directory structure changes, this will break. Consider using a more robust approach like finding the backend package directory or using environment variables to specify the root path.
| # backend/scheduler/daily_refinement_job.py -> backend/scheduler -> backend -> root | |
| repo_root = current_file.parent.parent.parent | |
| # Try to infer the repository root dynamically by locating the `backend` package. | |
| repo_root = None | |
| for parent in current_file.parents: | |
| backend_pkg = parent / "backend" / "__init__.py" | |
| if backend_pkg.is_file(): | |
| repo_root = parent | |
| break | |
| # Fallback to environment variable if `backend` package was not found. | |
| if repo_root is None: | |
| env_root = os.getenv("PROJECT_ROOT") | |
| if env_root: | |
| repo_root = Path(env_root).resolve() | |
| # Final fallback to preserve legacy behavior (assumes fixed directory structure). | |
| if repo_root is None: | |
| repo_root = current_file.parent.parent.parent |
| def _adjust_duplicate_threshold(self, issues: List[Issue]): | ||
| """ | ||
| Dynamically adjusts duplicate detection threshold based on duplicate rate. | ||
| """ | ||
| total = len(issues) | ||
| if total < 10: | ||
| return | ||
|
|
||
| # Assuming 'duplicate' status indicates a duplicate issue | ||
| duplicates = len([i for i in issues if i.status == 'duplicate']) | ||
| rate = duplicates / total | ||
|
|
||
| current_radius = adaptive_weights.duplicate_search_radius | ||
|
|
||
| # Simple heuristic: | ||
| # If too many duplicates (> 30%), we might be over-matching (radius too big). | ||
| # If too few (< 5%), we might be under-matching (radius too small). | ||
| # This is a naive control loop but satisfies the requirement. | ||
|
|
||
| if rate > 0.3: | ||
| new_radius = max(10.0, current_radius * 0.9) | ||
| if new_radius != current_radius: | ||
| adaptive_weights.duplicate_search_radius = new_radius | ||
| logger.info(f"High duplicate rate ({rate:.2f}). Decreased duplicate search radius to {new_radius:.1f}m") | ||
| elif rate < 0.05: | ||
| new_radius = min(100.0, current_radius * 1.1) | ||
| if new_radius != current_radius: | ||
| adaptive_weights.duplicate_search_radius = new_radius | ||
| logger.info(f"Low duplicate rate ({rate:.2f}). Increased duplicate search radius to {new_radius:.1f}m") |
There was a problem hiding this comment.
The duplicate threshold adjustment logic modifies the radius but doesn't provide any mechanism to track its effectiveness over time or prevent oscillation. If the rate alternates around the thresholds, the radius could oscillate between increasing and decreasing on successive days. Consider implementing a moving average or dampening factor to stabilize the adjustments.
| def _check_reload(self): | ||
| """Checks if file modified and reloads if needed.""" | ||
| now = time.time() | ||
| if now - self.last_check_time < self.check_interval: | ||
| return | ||
|
|
||
| self.last_check_time = now | ||
| if os.path.exists(self.data_path): | ||
| try: | ||
| mtime = os.path.getmtime(self.data_path) | ||
| if mtime > self.last_modified: | ||
| logger.info(f"Weights file changed. Reloading from {self.data_path}") | ||
| self.load_weights() | ||
| except Exception as e: | ||
| logger.error(f"Error checking weights file modification: {e}") | ||
|
|
There was a problem hiding this comment.
The hot-reload mechanism checks file modification every 5 seconds via property accessors, but there's a race condition: if save_weights() is called by the daily job while an API request is reading the properties, the file could be partially written when read. Additionally, the os.path.getmtime() check happens before opening the file, creating a TOCTTOU (Time-Of-Check-Time-Of-Use) vulnerability where the file could be modified between the mtime check and the actual read.
| # Assuming 'duplicate' status indicates a duplicate issue | ||
| duplicates = len([i for i in issues if i.status == 'duplicate']) |
There was a problem hiding this comment.
The code assumes 'duplicate' is a valid status value for issues, but according to IssueStatus enum in schemas.py (lines 19-24), only 'open', 'verified', 'assigned', 'in_progress', and 'resolved' are valid statuses. The Issue model uses a String column for status (not an enum), but the assumption that 'duplicate' exists appears incorrect based on the established schema conventions. This will result in a duplicate rate of 0% always, causing the threshold adjustment logic to never function correctly.
| # Assuming 'duplicate' status indicates a duplicate issue | |
| duplicates = len([i for i in issues if i.status == 'duplicate']) | |
| # Determine which statuses represent duplicates, if configured. | |
| duplicate_statuses = getattr(Issue, "DUPLICATE_STATUSES", None) | |
| if not duplicate_statuses: | |
| # No explicit duplicate status configured; skip adjustment to avoid | |
| # relying on unsupported status values. | |
| logger.debug("No duplicate statuses configured for Issue; skipping duplicate threshold adjustment.") | |
| return | |
| duplicates = len([i for i in issues if i.status in duplicate_statuses]) |
| db: Session = SessionLocal() | ||
| try: | ||
| result = civic_intelligence_engine.refine_daily(db) | ||
| logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}") | ||
| except Exception as e: | ||
| logger.error(f"Daily refinement job failed: {e}", exc_info=True) | ||
| finally: | ||
| db.close() |
There was a problem hiding this comment.
The database session is not properly closed in case of exception. While there's a finally block that calls db.close(), if an exception occurs during civic_intelligence_engine.refine_daily(db), the session may be left in an inconsistent state. Consider using a context manager (with SessionLocal() as db:) to ensure proper cleanup.
| db: Session = SessionLocal() | |
| try: | |
| result = civic_intelligence_engine.refine_daily(db) | |
| logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}") | |
| except Exception as e: | |
| logger.error(f"Daily refinement job failed: {e}", exc_info=True) | |
| finally: | |
| db.close() | |
| with SessionLocal() as db: | |
| try: | |
| result = civic_intelligence_engine.refine_daily(db) | |
| logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}") | |
| except Exception as e: | |
| logger.error(f"Daily refinement job failed: {e}", exc_info=True) |
|
|
||
| # Use DBSCAN via spatial_utils | ||
| # Default eps=30m might be too small for "geographic clustering" of trends. | ||
| # Let's use larger radius like 500m to find hotspots. |
There was a problem hiding this comment.
The DBSCAN clustering uses eps_meters=500.0 which is approximately converted to degrees (line 173 in spatial_utils.py uses eps_meters / 111000). However, the comment on line 86 says this is a "larger radius like 500m to find hotspots" but doesn't account for latitude variation where longitude degree distances vary with cos(latitude). Consider documenting this limitation or using the haversine metric consistently which DBSCAN already supports (line 177 in spatial_utils.py).
| # Let's use larger radius like 500m to find hotspots. | |
| # Use a larger nominal radius (around 500m) to find hotspots. Note: spatial_utils | |
| # converts eps_meters to degrees with a simple 1°≈111km approximation, so the | |
| # effective ground distance, especially in longitude, will vary with latitude. |
| def __init__(self): | ||
| # Keyword dictionaries for Severity Classification | ||
| self.severity_keywords = { | ||
| "critical": [ | ||
| "fire", "explosion", "blood", "death", "collapse", "gas leak", | ||
| "electric shock", "spark", "electrocution", "drowning", | ||
| "flood", "landslide", "earthquake", "cyclone", "hurricane", | ||
| "attack", "assault", "rabid", "deadly", "fatal", "emergency", | ||
| "blocked road", "ambulance", "hospital", "school", "child", | ||
| "exposed wire", "transformer", "chemical", "toxic", "poison", | ||
| "weapon", "gun", "bomb", "terror", "riot", "stampede", | ||
| "structural failure", "pillar", "bridge", "flyover", | ||
| "open manhole", "live wire", "gas smell", "open electrical box", | ||
| "burning", "flame", "smoke", "crack", "fissure" | ||
| ], | ||
| "high": [ | ||
| "accident", "injury", "broken", "bleeding", "hazard", "risk", | ||
| "dangerous", "unsafe", "threat", "pollution", "smoke", | ||
| "sewage", "overflow", "contamination", "infection", "disease", | ||
| "mosquito", "dengue", "malaria", "typhoid", "cholera", | ||
| "rat", "snake", "stray dog", "bite", "attack", "cattle", | ||
| "theft", "robbery", "burglary", "harassment", "abuse", | ||
| "illegal", "crime", "violation", "bribe", "corruption", | ||
| "traffic jam", "congestion", "gridlock", "delay", | ||
| "no water", "power cut", "blackout", "load shedding", | ||
| "pothole", "manhole", "open drain", "water logging", | ||
| "dead", "animal", "fish", "stuck", | ||
| "not working", "signal", "traffic light", "fallen tree", | ||
| "water leakage", "leakage", "burst", "pipe burst", "damage", | ||
| "leaning", "tilted", "unstable", "waterlogging" | ||
| ], | ||
| "medium": [ | ||
| "garbage", "trash", "waste", "litter", "rubbish", "dustbin", | ||
| "smell", "odor", "stink", "foul", "dirty", "unclean", | ||
| "messy", "ugly", "eyesore", "bad", "poor", | ||
| "leak", "drip", "seepage", "moisture", "damp", | ||
| "noise", "loud", "sound", "music", "party", "barking", | ||
| "encroachment", "hawker", "vendor", "stall", "shop", | ||
| "parking", "parked", "vehicle", "car", "bike", "scooter", | ||
| "construction", "debris", "material", "sand", "cement", | ||
| "graffiti", "poster", "banner", "hoarding", "advertisement", | ||
| "slippery", "muddy", "path", "pavement", "sidewalk", | ||
| "crowd", "gathering", "tap", "wasting", "running water", | ||
| "speed breaker", "hump", "bump" | ||
| ], | ||
| "low": [ | ||
| "light", "lamp", "bulb", "flicker", "dim", "dark", | ||
| "sign", "board", "paint", "color", "faded", | ||
| "bench", "chair", "seat", "grass", "plant", "tree", | ||
| "leaf", "branch", "garden", "park", "playground", | ||
| "cosmetic", "look", "appearance", "aesthetic", | ||
| "old", "rusty", "dirty", "leaning" | ||
| ] | ||
| } | ||
| # We now access weights dynamically from the singleton adaptive_weights instance | ||
| pass |
There was a problem hiding this comment.
The empty init method with only a pass statement and a comment is unnecessary. If the PriorityEngine no longer needs initialization logic, the init method can be removed entirely. Python will provide a default init automatically.
| # Factor 1: Engagement (More issues is good? Or bad? Prompt says "Civic Intelligence Index" | ||
| # usually implies the SYSTEM's intelligence or CITY's performance. | ||
| # "Top Emerging Concern: Water Supply" implies monitoring. | ||
| # Let's assume High Index = Healthy System/City. | ||
|
|
||
| # High resolution rate -> + | ||
| resolved_count = sum(1 for g in grievances if g.status == GrievanceStatus.RESOLVED) | ||
| total_grievances = len(grievances) |
There was a problem hiding this comment.
The comment on line 154 says "More issues is good? Or bad?" indicating uncertainty about the metric design. The final implementation doesn't factor in the total number of issues at all - it only considers resolution rate and severity distribution. This means the index could be the same whether there are 10 issues or 10,000 issues in a day, which may not accurately reflect civic health. Consider clarifying the metric definition or adding issue volume as a factor.
- Updated `backend/adaptive_weights.py` and `backend/civic_intelligence.py` to use absolute paths for `data/` directory relative to the project root. - This ensures compatibility with Render's file system layout and persistent disk mounts at `./data`. - Updated `.gitignore` to exclude `data/dailySnapshots/` and `data/modelWeights.json` as they are runtime generated data. - Verified scheduler script execution with database initialization. - Cleaned up local test artifacts.
There was a problem hiding this comment.
6 issues found across 6 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/trend_analyzer.py">
<violation number="1" location="backend/trend_analyzer.py:79">
P2: Bug: Truthiness check on coordinates incorrectly filters out `0.0` values. `0.0` is falsy in Python but is a valid coordinate. Use `is not None` to match the pattern used in `spatial_utils.py`.</violation>
</file>
<file name="backend/adaptive_weights.py">
<violation number="1" location="backend/adaptive_weights.py:107">
P1: Shallow copy of nested dicts/lists causes mutation of module-level defaults. `dict.copy()` only copies the top-level dict; the inner lists (e.g., `DEFAULT_SEVERITY_KEYWORDS["critical"]`) are still shared references. When `update_severity_keyword` or `update_category_keyword` calls `keywords.append(...)` or `keywords.remove(...)`, it mutates the global defaults, corrupting them for the rest of the process. Use `copy.deepcopy()` instead.</violation>
</file>
<file name="backend/data/modelWeights.json">
<violation number="1" location="backend/data/modelWeights.json:51">
P1: Duplicate keywords across severity levels: `"smoke"` and `"attack"` appear in both `critical` and `high` lists. Since `priority_engine.py` checks `critical` first, the `high` entries are dead — these keywords will always resolve to critical severity. If that's intended, remove the duplicates from `high` to avoid confusion; if not (e.g., "smoke" from air pollution should be high, not critical), the lists need to be disambiguated.</violation>
<violation number="2" location="backend/data/modelWeights.json:217">
P2: Duplicate keywords across severity levels: `"leaning"` appears in both `high` and `low`, and `"dirty"` appears in both `medium` and `low`. The entries in the `low` list are unreachable dead data since higher severity levels are checked first. Remove the duplicates from `low` to keep the configuration clean and avoid confusion.</violation>
</file>
<file name="backend/civic_intelligence.py">
<violation number="1" location="backend/civic_intelligence.py:87">
P1: Status comparison `i.status == 'duplicate'` will never match because 'duplicate' is not a valid Issue status value. Valid statuses are: OPEN, VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED. This makes the `_adjust_duplicate_threshold` function a permanent no-op since `duplicates` count will always be 0.</violation>
</file>
<file name="backend/scheduler/daily_refinement_job.py">
<violation number="1" location="backend/scheduler/daily_refinement_job.py:30">
P2: Scheduled job script swallows exceptions and always exits with code 0, making failures invisible to external schedulers. Consider re-raising or calling `sys.exit(1)` in the `except` block so that cron, systemd, or other orchestrators can detect and alert on failures.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
|
||
| def __init__(self, data_path: str = "backend/data/modelWeights.json"): | ||
| self.data_path = data_path | ||
| self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy() |
There was a problem hiding this comment.
P1: Shallow copy of nested dicts/lists causes mutation of module-level defaults. dict.copy() only copies the top-level dict; the inner lists (e.g., DEFAULT_SEVERITY_KEYWORDS["critical"]) are still shared references. When update_severity_keyword or update_category_keyword calls keywords.append(...) or keywords.remove(...), it mutates the global defaults, corrupting them for the rest of the process. Use copy.deepcopy() instead.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/adaptive_weights.py, line 107:
<comment>Shallow copy of nested dicts/lists causes mutation of module-level defaults. `dict.copy()` only copies the top-level dict; the inner lists (e.g., `DEFAULT_SEVERITY_KEYWORDS["critical"]`) are still shared references. When `update_severity_keyword` or `update_category_keyword` calls `keywords.append(...)` or `keywords.remove(...)`, it mutates the global defaults, corrupting them for the rest of the process. Use `copy.deepcopy()` instead.</comment>
<file context>
@@ -0,0 +1,239 @@
+
+ def __init__(self, data_path: str = "backend/data/modelWeights.json"):
+ self.data_path = data_path
+ self._severity_keywords = DEFAULT_SEVERITY_KEYWORDS.copy()
+ self._urgency_patterns = DEFAULT_URGENCY_PATTERNS.copy()
+ self._categories = DEFAULT_CATEGORIES.copy()
</file context>
| "open electrical box", | ||
| "burning", | ||
| "flame", | ||
| "smoke", |
There was a problem hiding this comment.
P1: Duplicate keywords across severity levels: "smoke" and "attack" appear in both critical and high lists. Since priority_engine.py checks critical first, the high entries are dead — these keywords will always resolve to critical severity. If that's intended, remove the duplicates from high to avoid confusion; if not (e.g., "smoke" from air pollution should be high, not critical), the lists need to be disambiguated.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/data/modelWeights.json, line 51:
<comment>Duplicate keywords across severity levels: `"smoke"` and `"attack"` appear in both `critical` and `high` lists. Since `priority_engine.py` checks `critical` first, the `high` entries are dead — these keywords will always resolve to critical severity. If that's intended, remove the duplicates from `high` to avoid confusion; if not (e.g., "smoke" from air pollution should be high, not critical), the lists need to be disambiguated.</comment>
<file context>
@@ -0,0 +1,453 @@
+ "open electrical box",
+ "burning",
+ "flame",
+ "smoke",
+ "crack",
+ "fissure"
</file context>
| return | ||
|
|
||
| # Assuming 'duplicate' status indicates a duplicate issue | ||
| duplicates = len([i for i in issues if i.status == 'duplicate']) |
There was a problem hiding this comment.
P1: Status comparison i.status == 'duplicate' will never match because 'duplicate' is not a valid Issue status value. Valid statuses are: OPEN, VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED. This makes the _adjust_duplicate_threshold function a permanent no-op since duplicates count will always be 0.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/civic_intelligence.py, line 87:
<comment>Status comparison `i.status == 'duplicate'` will never match because 'duplicate' is not a valid Issue status value. Valid statuses are: OPEN, VERIFIED, ASSIGNED, IN_PROGRESS, RESOLVED. This makes the `_adjust_duplicate_threshold` function a permanent no-op since `duplicates` count will always be 0.</comment>
<file context>
@@ -0,0 +1,217 @@
+ return
+
+ # Assuming 'duplicate' status indicates a duplicate issue
+ duplicates = len([i for i in issues if i.status == 'duplicate'])
+ rate = duplicates / total
+
</file context>
| Uses spatial utilities to find clusters of issues. | ||
| """ | ||
| # Filter issues with location | ||
| located_issues = [i for i in issues if i.latitude and i.longitude] |
There was a problem hiding this comment.
P2: Bug: Truthiness check on coordinates incorrectly filters out 0.0 values. 0.0 is falsy in Python but is a valid coordinate. Use is not None to match the pattern used in spatial_utils.py.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/trend_analyzer.py, line 79:
<comment>Bug: Truthiness check on coordinates incorrectly filters out `0.0` values. `0.0` is falsy in Python but is a valid coordinate. Use `is not None` to match the pattern used in `spatial_utils.py`.</comment>
<file context>
@@ -0,0 +1,114 @@
+ Uses spatial utilities to find clusters of issues.
+ """
+ # Filter issues with location
+ located_issues = [i for i in issues if i.latitude and i.longitude]
+
+ if not located_issues:
</file context>
| "old", | ||
| "rusty", | ||
| "dirty", | ||
| "leaning" |
There was a problem hiding this comment.
P2: Duplicate keywords across severity levels: "leaning" appears in both high and low, and "dirty" appears in both medium and low. The entries in the low list are unreachable dead data since higher severity levels are checked first. Remove the duplicates from low to keep the configuration clean and avoid confusion.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/data/modelWeights.json, line 217:
<comment>Duplicate keywords across severity levels: `"leaning"` appears in both `high` and `low`, and `"dirty"` appears in both `medium` and `low`. The entries in the `low` list are unreachable dead data since higher severity levels are checked first. Remove the duplicates from `low` to keep the configuration clean and avoid confusion.</comment>
<file context>
@@ -0,0 +1,453 @@
+ "old",
+ "rusty",
+ "dirty",
+ "leaning"
+ ]
+ },
</file context>
| result = civic_intelligence_engine.refine_daily(db) | ||
| logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}") | ||
| except Exception as e: | ||
| logger.error(f"Daily refinement job failed: {e}", exc_info=True) |
There was a problem hiding this comment.
P2: Scheduled job script swallows exceptions and always exits with code 0, making failures invisible to external schedulers. Consider re-raising or calling sys.exit(1) in the except block so that cron, systemd, or other orchestrators can detect and alert on failures.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/scheduler/daily_refinement_job.py, line 30:
<comment>Scheduled job script swallows exceptions and always exits with code 0, making failures invisible to external schedulers. Consider re-raising or calling `sys.exit(1)` in the `except` block so that cron, systemd, or other orchestrators can detect and alert on failures.</comment>
<file context>
@@ -0,0 +1,35 @@
+ result = civic_intelligence_engine.refine_daily(db)
+ logger.info(f"Daily refinement job completed successfully. Index Score: {result.get('index_score')}")
+ except Exception as e:
+ logger.error(f"Daily refinement job failed: {e}", exc_info=True)
+ finally:
+ db.close()
</file context>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
backend/adaptive_weights.py (2)
180-198: Properties expose mutable internal state — callers can silently corrupt weights without persisting.All four property accessors return direct references to the internal dicts/lists. Any caller that modifies the returned object (e.g.,
adaptive_weights.severity_keywords["critical"].append(...)) bypassessave_weights(), leaving the in-memory and on-disk states diverged. Consider returning deep copies from the properties, or documenting that callers must use the mutation methods.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/adaptive_weights.py` around lines 180 - 198, The four property getters (severity_keywords, urgency_patterns, categories, duplicate_search_radius) expose internal mutable state; update them to return deep copies for the dict/list types (use copy.deepcopy on self._severity_keywords, self._urgency_patterns, self._categories) so callers cannot mutate in-memory state without going through the existing mutation/save path (e.g., save_weights()); keep duplicate_search_radius as is (primitive) or return a copied primitive, and ensure callers are documented to use provided mutation methods that call save_weights().
137-138: Uselogger.exception()instead oflogger.error()in except blocks to preserve tracebacks.In all three exception handlers (lines 138, 154, 178),
logger.error(...)swallows the traceback.logger.exception(...)automatically attaches it, which is valuable for debugging file I/O failures in production.Also applies to: 153-154, 177-178
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/adaptive_weights.py` around lines 137 - 138, Replace the three places in backend/adaptive_weights.py where you call logger.error(...) inside except blocks with logger.exception(...) so the traceback is preserved; specifically update the except handlers that currently do logger.error(f"Error checking weights file modification: {e}") and the two other file I/O/weights-related except blocks to use logger.exception with the same contextual message (keep the descriptive text but remove manual inclusion of the exception object since logger.exception already records it).backend/civic_intelligence.py (3)
199-203: Uselogger.exceptionto preserve the traceback on snapshot save failure.except Exception as e: - logger.error(f"Failed to save snapshot: {e}") + logger.exception(f"Failed to save snapshot: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/civic_intelligence.py` around lines 199 - 203, The except block that catches errors while writing the snapshot currently calls logger.error and loses the traceback; change the error logging to logger.exception to preserve the full traceback and context (in the except Exception as e: block that handles open(filename, 'w') and json.dump(snapshot, f)), keeping the existing message text (e.g., "Failed to save snapshot: {e}") so the filename/snapshot variables remain referenced and the stack trace is recorded.
193-193: Useos.path.joininstead of f-string path concatenation.- filename = f"{SNAPSHOT_DIR}/{snapshot['date']}.json" + filename = os.path.join(SNAPSHOT_DIR, f"{snapshot['date']}.json")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/civic_intelligence.py` at line 193, Replace the f-string path concatenation when building filename with a platform-safe join: use os.path.join to combine SNAPSHOT_DIR and snapshot['date'] + '.json' instead of f"{SNAPSHOT_DIR}/{snapshot['date']}.json"; ensure the module imports os if not already present and update the assignment to filename to use os.path.join(SNAPSHOT_DIR, snapshot['date'] + '.json') so SNAPSHOT_DIR and snapshot['date'] are combined correctly across OSes.
139-144: Auto-learning keywords are only ever added, never pruned — risk of keyword inflation over time.The daily refinement adds keywords to the "critical" severity level but never removes stale ones. Over many cycles, the critical keyword list will grow unboundedly, diluting its signal. Consider adding a decay or pruning mechanism.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/civic_intelligence.py` around lines 139 - 144, The loop only ever adds keywords (top_keywords -> adaptive_weights.update_severity_keyword(..., add=True)) so critical keywords will inflate; implement a pruning/decay step that removes stale keywords from current_critical_keywords before or after adding: track recent frequency or last_seen for each keyword (e.g., via adaptive_weights store or a metadata map), and call adaptive_weights.update_severity_keyword("critical", keyword, add=False) for any keyword whose count/recency falls below a configured threshold or age (e.g., not seen in N days or frequency < M). Update or extend adaptive_weights to expose removal and/or decay APIs and run the pruning logic adjacent to the existing top_keywords loop (use the same symbols: top_keywords, current_critical_keywords, adaptive_weights.update_severity_keyword).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/adaptive_weights.py`:
- Around line 238-242: get_urgency_patterns_tuples can raise IndexError when a
pattern tuple has fewer than 2 elements; update it (or the urgency_patterns
loader) to validate structure before accessing p[1]. Specifically, in
get_urgency_patterns_tuples check each pattern from self.urgency_patterns for at
least two elements and correct types (e.g., str and int), skip or coerce
malformed entries (and optionally log or collect warnings) so the returned list
only contains safe (pattern, weight) tuples; reference the
get_urgency_patterns_tuples method and the urgency_patterns property when
implementing the guard/validation.
- Around line 124-157: AdaptiveWeights has race conditions during concurrent
HTTP handling; add a threading.Lock instance (e.g., self._lock) to the
AdaptiveWeights singleton and use it to guard all file I/O and state mutations:
acquire the lock at the start of _check_reload, load_weights, and save_weights
(and any other methods that read/write self._severity_keywords,
self._urgency_patterns, self._categories, self._duplicate_search_radius, or
self.last_modified), perform file existence checks, reads/writes and updates
while holding the lock, and release the lock afterwards to prevent TOCTOU and
partial read/write races.
In `@backend/civic_intelligence.py`:
- Line 134: The code is calling the private method
trend_analyzer._extract_top_keywords, breaking encapsulation; update the
TrendAnalyzer class to expose a public API (e.g., add a method named
extract_top_keywords on TrendAnalyzer that wraps the existing
_extract_top_keywords logic or rename the method to extract_top_keywords) and
then replace the call in this module to use
trend_analyzer.extract_top_keywords(critical_issues, top_n=10) so the consumer
uses the public method instead of the private one.
---
Duplicate comments:
In `@backend/adaptive_weights.py`:
- Around line 111-122: The __init__ currently uses shallow .copy() for
DEFAULT_SEVERITY_KEYWORDS, DEFAULT_URGENCY_PATTERNS, and DEFAULT_CATEGORIES
which leaves inner lists/dicts shared; change these allocations in
AdaptiveWeights.__init__ to use copy.deepcopy on those DEFAULT_* constants so
_severity_keywords, _urgency_patterns, and _categories get independent deep
copies and in-place mutations (e.g., update_severity_keyword) won't mutate
module-level defaults.
In `@backend/civic_intelligence.py`:
- Around line 47-51: historical_counts is currently pulled from only the last
snapshot via _load_last_snapshot and passed to trend_analyzer.analyze (which
calls _detect_spikes) but that treats a single day as the historical average;
change the code to load the last N snapshots (e.g., via a new or extended method
on the same snapshot loader), compute per-category average counts across those
snapshots (handle missing categories and divide by the number of snapshots
used), and pass that averaged dict as historical_counts to
trend_analyzer.analyze so _detect_spikes compares against a multi-day baseline
rather than a single prior day.
- Around line 90-92: The duplicate detection is wrong because Issue.status never
equals 'duplicate' so duplicates is always zero; update the comprehension in
_adjust_duplicate_threshold to detect duplicates using the actual model fields
(e.g., count issues where getattr(i, 'duplicate_of', None) is not None or
getattr(i, 'is_duplicate', False) or i.metadata.get('is_duplicate', False) as a
fallback) instead of i.status == 'duplicate', then recompute duplicates and rate
accordingly so the threshold adjustment can trigger.
---
Nitpick comments:
In `@backend/adaptive_weights.py`:
- Around line 180-198: The four property getters (severity_keywords,
urgency_patterns, categories, duplicate_search_radius) expose internal mutable
state; update them to return deep copies for the dict/list types (use
copy.deepcopy on self._severity_keywords, self._urgency_patterns,
self._categories) so callers cannot mutate in-memory state without going through
the existing mutation/save path (e.g., save_weights()); keep
duplicate_search_radius as is (primitive) or return a copied primitive, and
ensure callers are documented to use provided mutation methods that call
save_weights().
- Around line 137-138: Replace the three places in backend/adaptive_weights.py
where you call logger.error(...) inside except blocks with logger.exception(...)
so the traceback is preserved; specifically update the except handlers that
currently do logger.error(f"Error checking weights file modification: {e}") and
the two other file I/O/weights-related except blocks to use logger.exception
with the same contextual message (keep the descriptive text but remove manual
inclusion of the exception object since logger.exception already records it).
In `@backend/civic_intelligence.py`:
- Around line 199-203: The except block that catches errors while writing the
snapshot currently calls logger.error and loses the traceback; change the error
logging to logger.exception to preserve the full traceback and context (in the
except Exception as e: block that handles open(filename, 'w') and
json.dump(snapshot, f)), keeping the existing message text (e.g., "Failed to
save snapshot: {e}") so the filename/snapshot variables remain referenced and
the stack trace is recorded.
- Line 193: Replace the f-string path concatenation when building filename with
a platform-safe join: use os.path.join to combine SNAPSHOT_DIR and
snapshot['date'] + '.json' instead of f"{SNAPSHOT_DIR}/{snapshot['date']}.json";
ensure the module imports os if not already present and update the assignment to
filename to use os.path.join(SNAPSHOT_DIR, snapshot['date'] + '.json') so
SNAPSHOT_DIR and snapshot['date'] are combined correctly across OSes.
- Around line 139-144: The loop only ever adds keywords (top_keywords ->
adaptive_weights.update_severity_keyword(..., add=True)) so critical keywords
will inflate; implement a pruning/decay step that removes stale keywords from
current_critical_keywords before or after adding: track recent frequency or
last_seen for each keyword (e.g., via adaptive_weights store or a metadata map),
and call adaptive_weights.update_severity_keyword("critical", keyword,
add=False) for any keyword whose count/recency falls below a configured
threshold or age (e.g., not seen in N days or frequency < M). Update or extend
adaptive_weights to expose removal and/or decay APIs and run the pruning logic
adjacent to the existing top_keywords loop (use the same symbols: top_keywords,
current_critical_keywords, adaptive_weights.update_severity_keyword).
| def _check_reload(self): | ||
| """Checks if file modified and reloads if needed.""" | ||
| now = time.time() | ||
| if now - self.last_check_time < self.check_interval: | ||
| return | ||
|
|
||
| self.last_check_time = now | ||
| if os.path.exists(self.data_path): | ||
| try: | ||
| mtime = os.path.getmtime(self.data_path) | ||
| if mtime > self.last_modified: | ||
| logger.info(f"Weights file changed. Reloading from {self.data_path}") | ||
| self.load_weights() | ||
| except Exception as e: | ||
| logger.error(f"Error checking weights file modification: {e}") | ||
|
|
||
| def load_weights(self): | ||
| """Loads weights from the JSON file if it exists.""" | ||
| if os.path.exists(self.data_path): | ||
| try: | ||
| with open(self.data_path, 'r') as f: | ||
| data = json.load(f) | ||
| self._severity_keywords = data.get("severity_keywords", self._severity_keywords) | ||
| self._urgency_patterns = data.get("urgency_patterns", self._urgency_patterns) | ||
| self._categories = data.get("categories", self._categories) | ||
| self._duplicate_search_radius = data.get("duplicate_search_radius", self._duplicate_search_radius) | ||
|
|
||
| self.last_modified = os.path.getmtime(self.data_path) | ||
| logger.info(f"Loaded weights from {self.data_path}") | ||
| except Exception as e: | ||
| logger.error(f"Failed to load weights from {self.data_path}: {e}") | ||
| else: | ||
| logger.info(f"No existing weights file found at {self.data_path}. Using defaults.") | ||
| self.save_weights() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for scheduler, threading, and APScheduler usage
rg -n "daily_refinement|scheduler|APScheduler|Thread|BackgroundScheduler|SchedulerJob" --type=py -C 3Repository: RohanExploit/VishwaGuru
Length of output: 14894
🏁 Script executed:
# Find where adaptive_weights is instantiated and used
rg -n "adaptive_weights|AdaptiveWeights" --type=py -C 2Repository: RohanExploit/VishwaGuru
Length of output: 4907
🏁 Script executed:
# Check for any existing threading imports or locks in the codebase
rg -n "threading|Lock|RLock|Semaphore" --type=pyRepository: RohanExploit/VishwaGuru
Length of output: 2284
🏁 Script executed:
# Look at the main entry points and how the app is structured
fd -e "py" -E "__pycache__" | xargs grep -l "app\|main\|run\|serve" | head -20Repository: RohanExploit/VishwaGuru
Length of output: 558
🏁 Script executed:
# Check the scheduler job implementation
cat -n backend/scheduler/daily_refinement_job.pyRepository: RohanExploit/VishwaGuru
Length of output: 1389
🏁 Script executed:
# Check main entry point to understand app structure
cat -n backend/__main__.pyRepository: RohanExploit/VishwaGuru
Length of output: 1287
🏁 Script executed:
# Check if there's any APScheduler or similar background task integration
rg -n "APScheduler|schedule\|cron\|background" --type=py backend/Repository: RohanExploit/VishwaGuru
Length of output: 49
🏁 Script executed:
# Verify adaptive_weights is accessed from request handlers
rg -n "adaptive_weights\|priority_engine" backend/routers/ backend/main.py --type=py -B 2 -A 2Repository: RohanExploit/VishwaGuru
Length of output: 49
🏁 Script executed:
# Check if there are any existing lock patterns in adaptive_weights
cat -n backend/adaptive_weights.py | head -120Repository: RohanExploit/VishwaGuru
Length of output: 7179
Add thread safety to AdaptiveWeights singleton for concurrent HTTP request handling.
FastAPI uses a thread pool to handle concurrent HTTP requests in the same process. Multiple request handlers call priority_engine → adaptive_weights concurrently, creating race conditions in _check_reload(), load_weights(), and save_weights(). Concurrent file I/O without locking can cause partial reads (TOCTOU races), lost writes, or inconsistent state. This pattern is already applied throughout the codebase (pothole_detection.py, garbage_detection.py, local_ml_service.py all use threading.Lock). Add a threading.Lock around the read/write paths in AdaptiveWeights.
🧰 Tools
🪛 Ruff (0.15.1)
[warning] 137-137: Do not catch blind exception: Exception
(BLE001)
[warning] 138-138: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
[warning] 153-153: Do not catch blind exception: Exception
(BLE001)
[warning] 154-154: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/adaptive_weights.py` around lines 124 - 157, AdaptiveWeights has race
conditions during concurrent HTTP handling; add a threading.Lock instance (e.g.,
self._lock) to the AdaptiveWeights singleton and use it to guard all file I/O
and state mutations: acquire the lock at the start of _check_reload,
load_weights, and save_weights (and any other methods that read/write
self._severity_keywords, self._urgency_patterns, self._categories,
self._duplicate_search_radius, or self.last_modified), perform file existence
checks, reads/writes and updates while holding the lock, and release the lock
afterwards to prevent TOCTOU and partial read/write races.
| def get_urgency_patterns_tuples(self) -> List[Tuple[str, int]]: | ||
| """Returns urgency patterns as tuples (required by PriorityEngine).""" | ||
| # Call property to trigger reload check | ||
| patterns = self.urgency_patterns | ||
| return [(p[0], p[1]) for p in patterns] |
There was a problem hiding this comment.
get_urgency_patterns_tuples will raise IndexError on malformed pattern entries.
If the JSON file contains a pattern with fewer than 2 elements (e.g., ["some_regex"]), p[1] throws an IndexError. Consider adding a guard or validating pattern structure on load.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/adaptive_weights.py` around lines 238 - 242,
get_urgency_patterns_tuples can raise IndexError when a pattern tuple has fewer
than 2 elements; update it (or the urgency_patterns loader) to validate
structure before accessing p[1]. Specifically, in get_urgency_patterns_tuples
check each pattern from self.urgency_patterns for at least two elements and
correct types (e.g., str and int), skip or coerce malformed entries (and
optionally log or collect warnings) so the returned list only contains safe
(pattern, weight) tuples; reference the get_urgency_patterns_tuples method and
the urgency_patterns property when implementing the guard/validation.
|
|
||
| # Analyze text for common keywords in these critical issues | ||
| # We can reuse TrendAnalyzer logic but specific to this subset | ||
| top_keywords = trend_analyzer._extract_top_keywords(critical_issues, top_n=10) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Accessing private method trend_analyzer._extract_top_keywords() breaks encapsulation.
_extract_top_keywords is a private method on TrendAnalyzer. Promote it to a public method (drop the underscore) or expose a dedicated public API for keyword extraction, so this module isn't coupled to an implementation detail.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/civic_intelligence.py` at line 134, The code is calling the private
method trend_analyzer._extract_top_keywords, breaking encapsulation; update the
TrendAnalyzer class to expose a public API (e.g., add a method named
extract_top_keywords on TrendAnalyzer that wraps the existing
_extract_top_keywords logic or rename the method to extract_top_keywords) and
then replace the call in this module to use
trend_analyzer.extract_top_keywords(critical_issues, top_n=10) so the consumer
uses the public method instead of the private one.
- Added `scikit-learn` and `numpy` to `backend/requirements-render.txt`. - These dependencies are required by `backend/trend_analyzer.py` and `backend/spatial_utils.py` for clustering and trend detection functionality. - This resolves import errors during the Render build/runtime process.
🔍 Quality Reminder |
- Added `scikit-learn` and `numpy` to `backend/requirements-render.txt`. - These dependencies are required by `backend/trend_analyzer.py` and `backend/spatial_utils.py` for clustering and trend detection functionality. - This resolves import errors during the Render build/runtime process.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/requirements-render.txt (1)
18-19: Consider pinning scikit-learn and numpy versions for reproducibility.While these are large dependencies that can introduce breaking changes, the entire
requirements-render.txtfile follows an unpinned approach. If your deployment process requires reproducible builds, consider pinning all dependencies. If you maintain a separate lock file (e.g.,pip-compileorpoetry.lock), this is already handled. For reference, latest stable versions are scikit-learn 1.8.0 and numpy 2.4.0 (both supporting Python 3.11–3.14).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/requirements-render.txt` around lines 18 - 19, The dependencies scikit-learn and numpy are unpinned which harms reproducibility; update the requirements to pin explicit versions for these packages (for example, pin scikit-learn to 1.8.0 and numpy to 2.4.0 or use conservative range constraints) or ensure a proper lockfile (pip-compile/poetry.lock) is produced and committed so builds remain deterministic; target the entries referencing scikit-learn and numpy in the requirements list and apply the chosen pinning strategy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/requirements-render.txt`:
- Around line 18-19: The dependencies scikit-learn and numpy are unpinned which
harms reproducibility; update the requirements to pin explicit versions for
these packages (for example, pin scikit-learn to 1.8.0 and numpy to 2.4.0 or use
conservative range constraints) or ensure a proper lockfile
(pip-compile/poetry.lock) is produced and committed so builds remain
deterministic; target the entries referencing scikit-learn and numpy in the
requirements list and apply the chosen pinning strategy.
Implemented a daily scheduled job that analyzes civic issues to refine the system's intelligence.
Key components:
backend/data/modelWeights.json. It supports hot-reloading so changes by the scheduler are immediately picked up by the API.AdaptiveWeightsinstead of hardcoded values.backend/scheduler/daily_refinement_job.pyto run the daily process.This system allows VishwaGuru to self-improve by learning from new data and adapting its prioritization and deduplication logic automatically.
PR created automatically by Jules for task 10394596499515001806 started by @RohanExploit
Summary by cubic
Added a daily civic intelligence engine that learns from the last 24h to update weights, tune duplicate detection, and snapshot a Civic Intelligence Index. Fixed Render data paths for persistence and added scikit‑learn/numpy to resolve clustering imports; PriorityEngine now reads hot‑reloaded weights in real time.
New Features
Refactors
Written for commit 4604a37. Summary will update on new commits.
Summary by CodeRabbit
New Features
Chores