cpdata · cpdata · Oct 17, 2025 · Oct 17, 2025
diff --git a/.python-version b/.python-version
@@ -1 +1 @@
-3.13
+3.12
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,19 @@
 # Changelog
 
+## [2025-10-16T20:39:06-04:00 (America/New_York)]
+### Added
+- Added live integration coverage for Memgraph, Neo4j, and Redis via `meshmind/tests/test_integration_live.py` and configured
+  pytest markers/default options in `pyproject.toml` so `pytest -m integration` exercises the docker-compose stack.
+- Introduced `scripts/generate_synthetic_dataset.py` to produce large JSONL/CSV corpora (defaults: 10k memories, 20k triplets,
+  384-dim embeddings) for benchmarking and load testing.
+
+### Changed
+- Regenerated `uv.lock`, pinned `.python-version` to 3.12, and updated installation guidance (`README.md`, `SETUP.md`,
+  `docs/development.md`, `docs/testing.md`, `docs/operations.md`, `ENVIRONMENT_NEEDS.md`, `NEEDED_FOR_TESTING.md`) to standardise on
+  `uv sync --all-extras` and document the new Pydantic 2.x policy.
+- Refreshed planning and status collateral (`PROJECT.md`, `PLAN.md`, `SOT.md`, `ROADMAP.md`, `PLANNING_THOUGHTS.md`,
+  `RECOMMENDATIONS.md`, `RESUME_NOTES.md`, `DUMMIES.md`, `TODO.md`, `ISSUES.md`) to reflect integration workflows, dataset
+  generation guidance, and completed approval tasks.
 ## [2025-10-16T16:35:00-04:00 (America/New_York)]
 ### Added
 - Added a `serve-grpc` CLI subcommand (`meshmind/cli/__main__.py`) that instantiates

diff --git a/DUMMIES.md b/DUMMIES.md
@@ -7,10 +7,10 @@ recommended next step now that full dependencies can be installed.
 | Component | Location | Purpose | Current Usage | Recommended Action |
 | --- | --- | --- | --- | --- |
 | gRPC stub implementation | `meshmind/api/grpc.py` (`GrpcServiceStub` + generated protobuf helpers) | Provides an in-process service implementation backed by the canonical proto schema so tests can exercise the service layer without spinning up gRPC infrastructure. | Unit tests and docs rely on the stub while production traffic should flow through the new asyncio helpers in `meshmind.api.grpc_server`. | Keep the stub for tests; package the new server behind a CLI entry point and retire any ad-hoc wrappers once infrastructure is provisioned. |
-| Fake graph/storage drivers | `meshmind/testing/fakes.py` (`FakeMemgraphDriver`, `FakeRedisBroker`, `FakeEmbeddingEncoder`) | Provide offline stand-ins for Memgraph, Redis, and embedding models. | Pytest fixtures and documentation rely on these for isolation. | Keep as long as offline tests are desired; supplement with integration suites that use real services. |
+| Fake graph/storage drivers | `meshmind/testing/fakes.py` (`FakeMemgraphDriver`, `FakeRedisBroker`, `FakeEmbeddingEncoder`) | Provide offline stand-ins for Memgraph, Redis, and embedding models. | Pytest fixtures and documentation rely on these for isolation. | Keep as long as offline tests are desired; live coverage now exists via `pytest -m integration`, so continue documenting when new features require real services. |
 | Fake LLM client | `meshmind/testing/fakes.py` (`FakeLLMClient`) | Records per-request overrides and emits deterministic responses so tests exercise reranking without installing the OpenAI SDK. | Service/interface tests (`meshmind/tests/test_service_interfaces.py`, `test_client.py`) and the CLI fixtures inject this stub when `openai` is unavailable. | Keep for unit tests; add integration tests with real providers once keys and network access are provisioned. |
 | Dummy encoder fixture | `meshmind/tests/conftest.py` (`dummy_encoder`) and dependent tests | Supplies a lightweight embedding encoder for search tests. | Used across retrieval and service tests to avoid network calls. | Keep for unit tests; add integration coverage with real encoders once APIs are configured. |
-| Fake mgclient module | `meshmind/tests/test_memgraph_driver.py` (monkeypatch of `mgclient`) | Simulates the Memgraph client so driver code runs without the native binary. | Enables driver unit tests without installing `mgclient`. | Replace with real `mgclient`-backed tests once package access is ensured; keep shim for fallback coverage. |
+| Fake mgclient module | `meshmind/tests/test_memgraph_driver.py` (monkeypatch of `mgclient`) | Simulates the Memgraph client so driver code runs without the native binary. | Enables driver unit tests without installing `mgclient`. | Integration tests now exercise real `mgclient` via docker-compose; retain the fake for fast unit tests but prefer the live suite for regressions. |
 
 ## Retired Items
 

diff --git a/ENVIRONMENT_NEEDS.md b/ENVIRONMENT_NEEDS.md
@@ -1,11 +1,12 @@
 # Tasks for Human Project Manager
 
 - Keep the Python package layer aligned with the project extras during base image
-  refreshes. The `run/install_setup.sh` and `run/maintenance_setup.sh` scripts now
-  install the full optional stack (neo4j driver, `pymgclient`, Redis, Celery extras,
-  FastAPI/Uvicorn, LLM tooling, and developer linters/testers). Ensure cached
-  environments either run the maintenance script or bake these dependencies into the
-  image so cold starts do not regress coverage.
+  refreshes. `uv.lock` now targets Python 3.11–3.12 with a default pin of 3.12, and
+  the `run/install_setup.sh` / `run/maintenance_setup.sh` scripts call
+  `uv sync --all-extras` to install the full stack (neo4j driver, `pymgclient`,
+  Redis, Celery extras, FastAPI/Uvicorn, LLM tooling, developer linters/testers).
+  Ensure cached environments either run the maintenance script or bake these
+  dependencies into the image so cold starts do not regress coverage.
 - Provide system-level build dependencies for the graph drivers (e.g., `build-essential`,
   `cmake`, `libssl-dev`, `libkrb5-dev`) so `pymgclient` (and its `mgclient` module) install cleanly.
 - Provision external services and credentials (compose files now exist under the project
@@ -19,15 +20,19 @@
     alternative base URLs/models required for OpenRouter, Azure, or Google-hosted
     endpoints so the new `llm_client` overrides can be exercised end-to-end.
   - Default maintenance retry configuration (`MAINTENANCE_MAX_ATTEMPTS`, `MAINTENANCE_BASE_DELAY_SECONDS`) tuned for the deployed graph backend; surface recommended values once integration tests run against live clusters. *(Future refinement request once infra is available.)*
-- Supply datasets/fixtures (future request) representing large knowledge graphs to
-  stress-test consolidation heuristics and pagination under load.
+- Supply datasets/fixtures representing large knowledge graphs to stress-test
+  consolidation heuristics and pagination under load. The new
+  `scripts/generate_synthetic_dataset.py` utility produces JSONL/CSV corpora
+  (defaults: 10k memories, 20k triplets, 384-dim embeddings) that can be copied to
+  shared storage for on-demand benchmarking.
 - Maintain outbound package download access to PyPI and vendor repositories; this
   session confirmed package installation works when the network is open, and future
   sessions need the same capability to refresh locks or install new optional
   integrations.
-- Enable Docker or container runtime access (future request) so the provided
-  `docker-compose.yml` files can run inside this environment; alternatively, provision
-  remote services accessible to CI.
+- Ensure Docker or container runtime access remains available so the root
+  `docker-compose.yml` (and targeted stacks under `meshmind/tests/docker/`) can run
+  from CI and developer machines. Integration tests now expect these services to be
+  reachable via `docker compose up -d` before executing `pytest -m integration`.
 - Document credential management procedures and rotation cadence so secrets stay current.
 - Keep gRPC tooling (`grpcio`, `grpcio-tools`, protobuf compiler) available in cached environments; the proto definitions now
   back the production stubs and runtime server (`meshmind.api.grpc_server`). `scripts/generate_protos.py` regenerates bindings

diff --git a/ISSUES.md b/ISSUES.md
@@ -14,9 +14,9 @@
 - [x] Create real docker-compose services for Memgraph and Redis or remove the placeholder file.
 - [x] Centralize LLM provider usage behind a configurable client wrapper to remove direct `openai` imports scattered through the codebase.
 - [x] Surface LLM override fields via REST/gRPC payloads and integration tests so service clients can select providers/models like the CLI.
-- [ ] Document Neo4j driver requirements and verify connectivity against a live cluster (CLI connectivity checks exist but still need validation against a real instance).
+- [x] Document Neo4j driver requirements and verify connectivity against a live cluster (integration suite now hits the docker-compose Neo4j service).
 - [ ] Exercise the new namespace/entity-label filtering against live Memgraph/Neo4j datasets to confirm Cypher predicates behave as expected.
-- [ ] Regenerate `uv.lock` to reflect the updated dependency set (`pymgclient`, `fastapi`, `uvicorn`, extras) so CI tooling stays in sync.
+- [x] Regenerate `uv.lock` to reflect the updated dependency set (`pymgclient`, `fastapi`, `uvicorn`, extras) so CI tooling stays in sync.
 ## Medium Priority
 - [x] Persist results from consolidation and compression tasks back to the database (currently in-memory only).
 - [x] Refine `Memory.importance` scoring to reflect actual ranking heuristics instead of a constant.

diff --git a/NEEDED_FOR_TESTING.md b/NEEDED_FOR_TESTING.md
@@ -8,8 +8,8 @@
 - Use a virtual environment (`uv`, `venv`, or `conda`) to isolate dependencies.
 
 ## Python Dependencies
-- Install the project editable (with extras) using `pip install -e .[dev,docs,testing]` or
-  `uv pip install --system -e .[dev,docs,testing]` from the repository root.
+- Install the project editable (with extras) using `uv sync --all-extras` (preferred; honours `uv.lock` and the repository's
+  `.python-version`) or `pip install -e .[dev,docs,testing]` if `uv` is unavailable.
 - Core functionality relies on the OpenAI SDK (or compatible fork), `pydantic`, and `pydantic-settings`; the project now
   requires Pydantic 2.x directly (the legacy shim has been removed).
 - Optional packages improve specific workflows (now bundled in the editable install extras so they install automatically when
@@ -39,8 +39,9 @@
 - **Redis** for Celery task queues, referenced through `REDIS_URL`.
 - **LLM provider access** for extraction, embeddings, and reranking (`LLM_API_KEY` or fallback `OPENAI_API_KEY`, plus optional
   `LLM_*_BASE_URL` overrides for alternative providers).
-- Recommended: Docker Compose (shipped in repo) to run Memgraph, Neo4j, and Redis together when developing locally. Additional
-  targeted stacks live under `meshmind/tests/docker/` for integration tests.
+- Recommended: Docker Compose (shipped in repo) to run Memgraph, Neo4j, and Redis together when developing locally. Start the
+  root stack with `docker compose up -d` before executing `pytest -m integration`; targeted stacks live under
+  `meshmind/tests/docker/` for focused scenarios.
 
 ## Environment Variables
 - `GRAPH_BACKEND` — `memory`, `sqlite`, `memgraph`, or `neo4j` (defaults to `memory`).
@@ -66,15 +67,12 @@
   and exercise it with `fastapi.testclient.TestClient` (requires the `httpx`
   package); pair it with the `GrpcServiceStub` for lightweight gRPC coverage when
   external services are unavailable.
-- Use `meshmind/testing` fakes (`FakeMemgraphDriver`, `FakeRedisBroker`, `FakeEmbeddingEncoder`, `FakeLLMClient`) in tests or demos to eliminate external infrastructure requirements.
+- Use `meshmind/testing` fakes (`FakeMemgraphDriver`, `FakeRedisBroker`, `FakeEmbeddingEncoder`, `FakeLLMClient`) in tests or demos to eliminate external infrastructure requirements. Integration suites marked with `@pytest.mark.integration` exercise live Memgraph/Neo4j/Redis instances and expect the docker stack to be running.
 - Invoke `meshmind admin predicates` and `meshmind admin maintenance --max-attempts <n> --base-delay <seconds> --run <task>` during local runs to inspect predicate registries, telemetry, and tune maintenance retries without external services.
-- Use the benchmarking utilities in `scripts/` (`evaluate_importance.py`, `consolidation_benchmark.py`, `benchmark_pagination.py`) to validate heuristics and driver performance offline before connecting to live infrastructure.
+- Use the benchmarking utilities in `scripts/` (`evaluate_importance.py`, `consolidation_benchmark.py`, `benchmark_pagination.py`) to validate heuristics and driver performance offline before connecting to live infrastructure. Generate large corpora with `scripts/generate_synthetic_dataset.py` when you need ≥10k memories for stress tests.
 - Seed demo data as needed using the `examples/extract_preprocess_store_example.py` script after configuring environment
   variables.
 - Create a `.env` file storing the environment variables above for consistent local configuration.
 
 ## Current Blockers in This Environment
-- Neo4j/Memgraph binaries and Docker are unavailable in this workspace, preventing local graph provisioning; use the in-memory or SQLite drivers instead.
-- Redis cannot be installed without container or host-level access; Celery tasks remain untestable locally until a remote
-  instance is provisioned (the fake broker satisfies unit tests but not end-to-end runs).
-- External network restrictions may limit installation of proprietary packages or access to OpenAI endpoints.
+- External network restrictions may limit installation of proprietary packages or access to OpenAI-compatible endpoints.
diff --git a/PLAN.md b/PLAN.md
@@ -18,8 +18,9 @@
    Graph-backed wrappers now rely on driver-side filtering, pagination, and aggregation before in-memory scoring. Next: push
    similarity computation into Memgraph/Neo4j so vector rankings can execute server-side without Python hydration.
 2. **Maintenance Tasks** – Tasks emit telemetry, persist consolidation/compression results, and now retry conflicting writes with
-   configurable exponential backoff (`MAINTENANCE_MAX_ATTEMPTS`, `MAINTENANCE_BASE_DELAY_SECONDS`). Synthetic benchmark scripts and
-   large-fixture tests validate behaviour on bigger workloads; next, replay production-like datasets to tune thresholds.
+   configurable exponential backoff (`MAINTENANCE_MAX_ATTEMPTS`, `MAINTENANCE_BASE_DELAY_SECONDS`). Synthetic benchmark scripts,
+   the new `scripts/generate_synthetic_dataset.py`, and integration tests against live Memgraph/Neo4j validate behaviour on larger
+   workloads; next, replay production-like datasets to tune thresholds.
 3. **Importance Scoring Improvements** – Heuristic scoring is live, records distribution metrics via telemetry, and ships with
    `scripts/evaluate_importance.py` for synthetic/offline evaluation. Next: incorporate real feedback loops or LLM-assisted
    ranking to tune weights over time.
@@ -30,9 +31,9 @@
 
 ## Phase 4 – Developer Experience & Tooling (In Progress)
 1. **Testing Overhaul** – Pytest suites rely on local fixtures and fake drivers with coverage for graph-backed retrieval, Neo4j
-   connectivity shims, CLI admin helpers, documentation guard, setup scripts, and the new benchmarking utilities. Continue adding
-   cross-backend integration coverage and track shim retirement progress in `DUMMIES.md` so integration suites can replace them
-   incrementally.
+   connectivity shims, CLI admin helpers, documentation guard, setup scripts, the new benchmarking utilities, and live
+   integration coverage (`pytest -m integration`) for Memgraph/Neo4j/Redis. Continue tracking shim retirement progress in
+   `DUMMIES.md` so integration suites can replace them incrementally.
 2. **Automation & CI** – Makefile provides lint/format/type/test/docs-guard targets and CI runs fmt-check, docs guard, and
    pytest. Protobuf drift now fails CI via `make protos-check`. Add caching and matrix builds when dependencies stabilize.
 3. **Environment Provisioning** – Docker Compose now provisions Memgraph, Neo4j,

diff --git a/PLANNING_THOUGHTS.md b/PLANNING_THOUGHTS.md
@@ -11,9 +11,10 @@
 - **LLM Provider Strategy** – Track evaluation results for OpenAI-compatible providers (OpenRouter, Google) and capture failover requirements.
 - **Maintenance Scheduling** – Record chosen cadence, concurrency, and backoff defaults once consolidation heuristics are validated at scale.
 - **Schema Governance** – Capture conventions for namespaces, entity labels, and predicate registries so ingestion pipelines stay consistent.
+- **Pydantic Model Policy** – Follow the documented plan (target Pydantic 2.12+, refresh locks when 3.13 wheels land, record migration guidance) to avoid resurrecting compatibility shims.
 
 ## Upcoming Research
-- Benchmark consolidation heuristics on synthetic datasets representing customer scale and capture telemetry snapshots.
+- Benchmark consolidation heuristics on synthetic datasets representing customer scale and capture telemetry snapshots (seed data via `scripts/generate_synthetic_dataset.py`).
 - Compare graph query latency across in-memory, SQLite, Memgraph, and Neo4j drivers when using pagination and filtering.
 - Evaluate rerank quality across LLM providers using a labelled evaluation set to determine optimal default models.
 - Investigate options for secure secret storage (e.g., Vault, AWS Secrets Manager) to standardise API key management.
diff --git a/PROJECT.md b/PROJECT.md
@@ -46,7 +46,7 @@
 
 ## Partially Implemented or Fragile Areas
 - The LLM-backed embedding wrapper still assumes dictionary-style responses; adjust once SDK models are fully adopted.
-- Neo4j driver support is import-guarded; the new CLI connectivity check still needs validation against a live cluster.
+- Neo4j driver support remains import-guarded, but integration tests now validate CRUD/count operations against the docker-compose Neo4j stack. Continue monitoring driver updates for production deployments.
 - Maintenance tasks rely on in-process heuristics for consolidation summaries; conflict resolution now retries with configurable exponential backoff, but long-term storage thresholds still need validation against production datasets.
 - Importance scoring now records telemetry but still relies on heuristics; richer scoring logic or LLM-assisted ranking is pending.
 - SQLite driver currently stores JSON blobs; future work may normalize columns for structured querying.
@@ -76,7 +76,9 @@
 - Benchmark and evaluation utilities live in `scripts/` (`evaluate_importance.py`, `consolidation_benchmark.py`, `benchmark_pagination.py`) to validate heuristics and driver performance without external infrastructure.
 - Developer-facing documentation now lives in `docs/` alongside the canonical `README.md`; the docs guard (`make docs-guard`) enforces synchronized updates when modules change.
 - Docker Compose now provisions Memgraph, Neo4j, and Redis; integration-specific stacks (including the Celery worker) live under
-  `meshmind/tests/docker/`. See `ENVIRONMENT_NEEDS.md` and `SETUP.md` for enabling optional services locally.
+  `meshmind/tests/docker/`. `pytest -m integration` exercises live services once the stack is running. See `ENVIRONMENT_NEEDS.md`
+  and `SETUP.md` for enabling optional services locally.
+- `scripts/generate_synthetic_dataset.py` produces large JSONL/CSV corpora (defaults: 10k memories, 20k triplets, 384-dim embeddings) to stress retrieval and consolidation flows prior to ingesting real datasets.
 
 ## Roadmap Highlights
 - Push graph-backed retrieval deeper into the drivers (vector similarity, structured filters) so the new server-side filtering/pagination evolves into full backend-native ranking.