Skip to content

darron/ff-workers

Repository files navigation

Mass Murder Canada - Cloudflare Workers

Cloudflare Workers/D1 implementation of the Mass Murder Canada site, migrated from the original Go/Echo app.

Original project: github.com/darron/ff

Features

  • Original public URL structure preserved.
  • Admin dashboard for records and linked news stories.
  • REST-style admin APIs for CRUD + AI queue operations.
  • AI summarization pipeline:
    • Per-story extraction and summary.
    • Record-level synthesis across linked sources.
    • Source typing (official, news, social, other) with social-only incidents treated as alleged.
    • Chunked queue processing for large records.
  • AI summaries are rendered as HTML from Markdown on record pages.
  • Sentry error monitoring (fetch + queue) via @sentry/cloudflare.

Setup

See docs/SETUP.md for full setup.

Quick start:

  1. npm install
  2. Configure admin auth (see docs/ADMIN_SETUP.md)
  3. npm run dev
  4. Deploy as needed:
    • Staging: npx wrangler deploy --env staging
    • Production: npx wrangler deploy --env production

Documentation

Project Structure

ff-workers/
├── src/
│   ├── index.js                  # Worker entrypoint (routes + queue + Sentry wrapper)
│   ├── admin.js                  # Admin API handlers
│   ├── admin-ui.js               # Admin dashboard HTML/JS
│   ├── ai-summary.js             # Queue-driven AI summarization pipeline
│   ├── source-classification.js  # URL/source credibility typing
│   ├── db.js                     # Record/story queries
│   ├── auth.js                   # Admin authentication/session helpers
│   └── templates.js              # Public page templates + markdown renderer
├── scripts/
│   └── deploy-production-with-sentry.sh
├── migrations/
│   ├── 0001_initial.sql
│   ├── 0002_data.sql
│   ├── data/
│   └── prod-data/
├── wrangler.toml
├── package.json
├── migrate-data.cjs
├── import-prod-dump.cjs
└── database_dump.sql

Routes

Public:

  • /
  • /records/group/:group
  • /records/provinces/:province
  • /records/:id

Admin:

  • /admin
  • /admin/api/records/*
  • /admin/api/stories/*
  • /admin/api/sentry-test

Environments

Configured in wrangler.toml:

  • compatibility_flags = ["nodejs_compat"] (required for Sentry SDK)
  • Queue binding name in code: SUMMARY_QUEUE

Staging (--env staging):

  • Worker: massmurdercanada-staging
  • AI: enabled, manual on save (AI_SUMMARY_AUTO_ON_SAVE=false)
  • AI_SUMMARY_STORIES_PER_JOB=10
  • Queue: massmurdercanada-staging-summary
  • Queue consumer: max_batch_size=5, max_batch_timeout=10

Production (--env production):

  • Worker/routes: massmurdercanada on massmurdercanada.org/*
  • AI: enabled, auto on save (AI_SUMMARY_AUTO_ON_SAVE=true)
  • AI_SUMMARY_STORIES_PER_JOB=5
  • Queue: massmurdercanada-production-summary
  • Queue consumer: max_batch_size=1, max_batch_timeout=5

AI Summary Pipeline

Trigger paths:

  • Manual per-record: POST /admin/api/records/:id/summarize
  • Bulk backfill: POST /admin/api/records/summarize-all
  • Auto-on-save (when enabled): record/story create/update operations enqueue a job

Bulk backfill request options:

  • limit (1-100, default 25)
  • offset (default 0)
  • only_missing (default true)
  • include_fallback (default true)

Extraction flow per story:

  1. Reuse stored body_text when sufficient.
  2. Direct fetch + structured extraction (JSON-LD/article/main/meta).
  3. Optional summarize daemon fallback (AI_FETCH_SUMMARIZE_DAEMON_URL).
  4. Optional fallback readers: r.jina.ai, markdown.new.

Additional behavior:

  • RCMP URLs are normalized from rcmp-grc.gc.ca to rcmp.ca.
  • Unsafe URLs (non-public/localhost/private IP) are blocked.
  • Large records are processed in chunks; final synthesis runs on last chunk.
  • Source selection for synthesis favors official/news and de-emphasizes social links unless social is all that exists.
  • Structured logs are emitted as ai_summary_queue_job.
  • Record metadata date is treated as year-only for synthesis context.

Optional summarize daemon token secret:

  • npx wrangler secret put AI_FETCH_SUMMARIZE_DAEMON_TOKEN --env production
  • npx wrangler secret put AI_FETCH_SUMMARIZE_DAEMON_TOKEN --env staging

Queue Setup

Create queues once (use latest Wrangler):

npx wrangler@latest queues create massmurdercanada-staging-summary \
  --message-retention-period-secs 86400 \
  --delivery-delay-secs 0

npx wrangler@latest queues create massmurdercanada-production-summary \
  --message-retention-period-secs 86400 \
  --delivery-delay-secs 0

Then deploy the Worker for each environment.

Error Monitoring (Sentry)

Sentry is wired through @sentry/cloudflare and reads runtime config from env/secrets:

  • SENTRY_DSN (secret)
  • SENTRY_RELEASE (optional var)
  • SENTRY_ENVIRONMENT (optional var)

Set DSN secret (production):

npx wrangler secret put SENTRY_DSN --env production

Admin Sentry test:

  • Dashboard button calls POST /admin/api/sentry-test.
  • If SENTRY_DSN is not set (e.g., staging), endpoint returns 412 instead of failing deployment/runtime.

Release + deploy workflow:

Notes

  • Dates are stored in mixed formats, but UI and synthesis treat canonical record dates as year-level context.
  • AI backfill targets missing summaries by default and can include existing fallback summaries.
  • Story summaries and record synthesis are stored in D1 (news_stories.ai_summary, records.ai_summary).