Skip to content

Conversation

@martyngigg
Copy link
Contributor

@martyngigg martyngigg commented Dec 18, 2025

Summary

Refs #109

Summary by CodeRabbit

Release Notes

  • Documentation

    • Added comprehensive deployment documentation covering prerequisites, networking setup, and service deployment procedures.
    • Introduced data architecture documentation describing layered data models and catalogue structure.
    • Expanded system architecture documentation outlining key services and their interactions.
  • Chores

    • Updated configuration tooling and development infrastructure.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 18, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request restructures documentation infrastructure, transitioning from Sphinx to a new structure with comprehensive deployment guides. It adds linting configuration, implements a Lakekeeper REST client script, creates architecture diagrams, introduces Ansible deployment documentation, and removes outdated documentation files. Architecture decision records are updated, and supporting infrastructure configurations are refined.

Changes

Cohort / File(s) Summary
Linting configuration
.markdownlint.json, .pre-commit-config.yaml
Adds Markdownlint configuration disabling MD013, configuring MD007 with indent of 3, and registers pre-commit hook for markdownlint-cli2 with docker support.
Sphinx documentation removal
docs-devel/mkdocs.yml, docs-devel/requirements.in, docs-devel/requirements.txt, docs-devel/src/conf.py, docs-devel/src/data-architecture/*, docs-devel/src/deployment/index.md, docs-devel/src/index.md, docs-devel/src/system-architecture/index.md, docs-devel/src/assets/images/lakehouse-poc.drawio
Removes entire Sphinx-based documentation structure, configuration, and related Python dependencies.
New developer documentation
docs-devel/index.md, docs-devel/data-architecture/index.md, docs-devel/data-architecture/catalogs.md, docs-devel/system-architecture/index.md, docs-devel/deployment/index.md, docs-devel/deployment/prerequisites.md
Introduces comprehensive documentation for Analytics Data Platform architecture (Medallion-style layering), catalogs, system components (Keycloak, Lakekeeper, Trino, Superset), and Ansible-based deployment procedures.
Architecture decision records
docs-devel/system-architecture/adrs/000[1-5]-*.md, docs-devel/system-architecture/adrs/0006-use-lakekeeper-as-iceberg-catalog.md
Removes Date metadata from existing ADRs; adds new ADR 0006 documenting Lakekeeper selection as Iceberg REST catalog.
Diagrams and assets
docs-devel/images/.gitignore, docs-devel/images/analytics-data-platform-arch.drawio
Adds Draw.io configuration ignore rule for temporary files and introduces new Analytics Data Platform architecture diagram.
Infrastructure deployment
infra/scripts/lakekeeper.py, infra/ansible-docker/group_vars/all/openstack.yml, infra/ansible-docker/playbooks/cloud/private_network_create.yml, infra/ansible-docker/readme.md
Implements LakekeeperRestV1 OAuth2 client class with bootstrap and validation methods; removes management VM FIP variable; adds global configuration loading to provisioning playbook; consolidates Ansible instructions to reference external deployment docs.
Documentation updates
AGENTS.md, elt-common/README.md, infra/local/README.md, readme.md
Refactors AGENTS.md with reference links and shortened content; updates test guidance references; standardises URL and credential formatting in local documentation; updates Components section link.

Poem

🐰 Hops through restructured docs with glee,
From Sphinx's past to clarity!
Lakekeeper's OAuth flows so bright,
Diagrams dance in Markdown light,
Ansible deploys with playbooks neat—
Platform foundations, now complete! 📋✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the primary focus of the changeset: substantial improvements to developer documentation across the docs-devel directory, including new architecture docs, deployment guides, and configuration files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (1)
AGENTS.md (1)

44-51: Improved PR guidelines with helpful reminder.

The reformatted guidelines and emphasis on checking the actual template file will help ensure contributors use the latest format.

Minor: Consider GitHub capitalisation

On line 44, .github refers to the directory name. However, if you're referring to the platform elsewhere, the official capitalisation is "GitHub" (capital H).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between a4f0e80 and e9573f8.

⛔ Files ignored due to path filters (5)
  • docs-devel/images/analytics-data-platform-arch.drawio.png is excluded by !**/*.png
  • docs-devel/images/analytics-data-platform-arch.png is excluded by !**/*.png
  • docs-devel/images/logo.svg is excluded by !**/*.svg
  • docs-devel/src/assets/images/lakehouse-data-architecture-layers.jpg is excluded by !**/*.jpg
  • docs-devel/src/assets/images/lakehouse-poc.png is excluded by !**/*.png
📒 Files selected for processing (34)
  • .markdownlint.json (1 hunks)
  • .pre-commit-config.yaml (1 hunks)
  • AGENTS.md (3 hunks)
  • docs-devel/data-architecture/catalogs.md (1 hunks)
  • docs-devel/data-architecture/index.md (1 hunks)
  • docs-devel/deployment/index.md (1 hunks)
  • docs-devel/deployment/prerequisites.md (1 hunks)
  • docs-devel/images/.gitignore (1 hunks)
  • docs-devel/images/analytics-data-platform-arch.drawio (1 hunks)
  • docs-devel/index.md (1 hunks)
  • docs-devel/mkdocs.yml (0 hunks)
  • docs-devel/requirements.in (0 hunks)
  • docs-devel/requirements.txt (0 hunks)
  • docs-devel/src/assets/images/lakehouse-poc.drawio (0 hunks)
  • docs-devel/src/conf.py (0 hunks)
  • docs-devel/src/data-architecture/index.md (0 hunks)
  • docs-devel/src/data-architecture/sources.md (0 hunks)
  • docs-devel/src/deployment/index.md (0 hunks)
  • docs-devel/src/index.md (0 hunks)
  • docs-devel/src/system-architecture/index.md (0 hunks)
  • docs-devel/system-architecture/adrs/0001-record-architecture-decisions.md (0 hunks)
  • docs-devel/system-architecture/adrs/0002-use-ansible-for-vm-provisioning-and-configuration.md (0 hunks)
  • docs-devel/system-architecture/adrs/0003-use-scd-cloud-vms.md (0 hunks)
  • docs-devel/system-architecture/adrs/0004-use-minio-as-object-store.md (0 hunks)
  • docs-devel/system-architecture/adrs/0005-use-apache-iceberg-as-lakehouse-table-format.md (0 hunks)
  • docs-devel/system-architecture/adrs/0006-use-lakekeeper-as-iceberg-catalog.md (1 hunks)
  • docs-devel/system-architecture/index.md (1 hunks)
  • elt-common/README.md (1 hunks)
  • infra/ansible-docker/group_vars/all/openstack.yml (0 hunks)
  • infra/ansible-docker/playbooks/cloud/private_network_create.yml (1 hunks)
  • infra/ansible-docker/readme.md (1 hunks)
  • infra/local/README.md (1 hunks)
  • infra/scripts/lakekeeper.py (1 hunks)
  • readme.md (1 hunks)
💤 Files with no reviewable changes (16)
  • docs-devel/src/system-architecture/index.md
  • docs-devel/mkdocs.yml
  • docs-devel/system-architecture/adrs/0003-use-scd-cloud-vms.md
  • infra/ansible-docker/group_vars/all/openstack.yml
  • docs-devel/system-architecture/adrs/0001-record-architecture-decisions.md
  • docs-devel/src/assets/images/lakehouse-poc.drawio
  • docs-devel/src/data-architecture/sources.md
  • docs-devel/src/deployment/index.md
  • docs-devel/src/index.md
  • docs-devel/src/data-architecture/index.md
  • docs-devel/system-architecture/adrs/0005-use-apache-iceberg-as-lakehouse-table-format.md
  • docs-devel/system-architecture/adrs/0002-use-ansible-for-vm-provisioning-and-configuration.md
  • docs-devel/src/conf.py
  • docs-devel/requirements.in
  • docs-devel/system-architecture/adrs/0004-use-minio-as-object-store.md
  • docs-devel/requirements.txt
🧰 Additional context used
🪛 LanguageTool
docs-devel/data-architecture/catalogs.md

[uncategorized] ~7-~7: Loose punctuation mark.
Context: ...IS Lakehouse. ## Index - accelerator: Provides data for analysis of accelerat...

(UNLIKELY_OPENING_PUNCTUATION)

docs-devel/deployment/index.md

[uncategorized] ~4-~4: Use a comma before ‘but’ if it connects two independent clauses (unless they are closely connected and short).
Context: ... be deployed on any cloud infrastructure but it will require_ _changes before this w...

(COMMA_COMPOUND_SENTENCE)


[uncategorized] ~49-~49: Possible missing comma found.
Context: ... playbooks/traefik/deploy.yml ``` Once deployed check the Traefik dashboard is availabl...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~55-~55: Possible missing comma found.
Context: ...ng available. Each service has a single VM with the exception of Superset that hos...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~55-~55: Consider using “except” or “except for”
Context: ...available. Each service has a single VM with the exception of Superset that hosts multiple instances ...

(WITH_THE_EXCEPTION_OF)


[uncategorized] ~56-~56: Possible missing preposition found.
Context: ... Superset that hosts multiple instances one once machine. First create the VMs: `...

(AI_HYDRA_LEO_MISSING_IN)

docs-devel/index.md

[typographical] ~6-~6: Consider adding a comma.
Context: ...5.sharepoint.com/sites/ISISProject-1432) there is appetite for trialling a system to s...

(IF_THERE_COMMA)


[style] ~18-~18: Consider removing “of” to be more concise
Context: ...his repository is a monorepo containing all of the code for the platform. It may be separa...

(ALL_OF_THE)

docs-devel/system-architecture/index.md

[uncategorized] ~15-~15: “they” seems less likely than “the”.
Context: ...ervices Here we provide an overview of they key services. For details on their depl...

(AI_HYDRA_LEO_CP_THEY_THE)


[grammar] ~63-~63: A determiner cannot be combined with a possessive pronoun. Did you mean simply “a” or “our”?
Context: ...mmon elt-common is a our own Python package that provides additi...

(A_MY)

AGENTS.md

[uncategorized] ~44-~44: The official name of this software platform is spelled with a capital “H”.
Context: ...the current PR template**: Always check .github/PULL_REQUEST_TEMPLATE.md for the lates...

(GITHUB)

docs-devel/data-architecture/index.md

[style] ~5-~5: Would you like to use the Oxford spelling “organize”? The spelling ‘organise’ is also correct.
Context: ...icks.com/aws/en/lakehouse/medallion) to organise data as it flows from source systems th...

(OXFORD_SPELLING_Z_NOT_S)


[style] ~14-~14: Would you like to use the Oxford spelling “normalized”? The spelling ‘normalised’ is also correct.
Context: ...its original state or minimally cleaned/normalised. The bronze layer is the single source ...

(OXFORD_SPELLING_Z_NOT_S)


[uncategorized] ~21-~21: Possible missing comma found.
Context: ...er layer typically applies little to no aggregation leaving detailed information for end us...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~33-~33: Possible missing comma found.
Context: ...scription of the implementation of this architecture see the [system architecture](../system...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~35-~35: Possible missing comma found.
Context: ...on of the structure of the catalogs for ISIS see catalogs.

(AI_HYDRA_LEO_MISSING_COMMA)

infra/local/README.md

[uncategorized] ~58-~58: Possible missing comma found.
Context: ...nt: https://localhost:58443 (note the https as this is required by Trino. You may n...

(AI_HYDRA_LEO_MISSING_COMMA)

docs-devel/system-architecture/adrs/0006-use-lakekeeper-as-iceberg-catalog.md

[uncategorized] ~11-~11: Possible missing comma found.
Context: ...mplementations exist within the Iceberg libraries including: - REST: a server-side catal...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~24-~24: Possible missing comma found.
Context: ...bility in how clients interact with the catalog and several implementations exist that ...

(AI_HYDRA_LEO_MISSING_COMMA)


[grammar] ~49-~49: This is normally spelled as one word.
Context: ...## Consequences Lakekeeper will be the back bone of the data lake and is a relatively ne...

(BACK_COMPOUNDS)


[uncategorized] ~49-~49: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ata lake and is a relatively new project so there is some risk that it does not sur...

(COMMA_COMPOUND_SENTENCE_2)


[uncategorized] ~51-~51: Possible missing comma found.
Context: ...rating to a new catalog if that becomes necessary as the catalog only stores metadata and...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 Ruff (0.14.8)
infra/scripts/lakekeeper.py

78-78: Probable use of requests call without timeout

(S113)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: elt-common end-to-end tests
🔇 Additional comments (14)
.markdownlint.json (1)

3-5: Non-standard list indentation value.

The MD007 indent is set to 3, which is unconventional. Most markdown style guides recommend using 2 or 4 spaces for list indentation to maintain consistency with common formatting standards. Please verify this is intentional.

.pre-commit-config.yaml (1)

27-27: Excluding documentation from markdown linting seems counterintuitive.

The exclusion pattern skips the docs directory from markdown linting. Documentation files would typically benefit the most from consistent markdown formatting. Please verify this exclusion is intentional and document the rationale if there's a specific reason (e.g., auto-generated docs or conflicting formatting requirements).

elt-common/README.md (1)

39-39: Good improvement to link accessibility.

Changing the link text from "here" to a descriptive label improves documentation clarity and accessibility.

docs-devel/images/.gitignore (1)

1-2: LGTM!

The gitignore pattern correctly excludes Draw.io temporary save files.

readme.md (1)

10-11: Good documentation navigation improvement.

The updated link provides a more specific target with an explanatory note, improving developer experience.

infra/ansible-docker/playbooks/cloud/private_network_create.yml (1)

2-7: Verify the necessity of explicit variable loading.

Ansible typically loads variables from group_vars/ automatically based on the inventory structure. This explicit include_vars task may be redundant unless the playbook is executed in a non-standard manner or there's a specific ordering requirement.

Please confirm whether this explicit variable loading is required, or if Ansible's automatic group_vars loading would suffice.

infra/local/README.md (1)

50-58: LGTM!

The formatting improvements enhance readability and provide better markdown rendering across different parsers.

infra/ansible-docker/readme.md (1)

1-5: Excellent documentation consolidation.

Centralising deployment guidance in a single location reduces maintenance overhead and ensures consistency.

AGENTS.md (1)

20-20: Good consolidation of documentation references.

Linking to centralised documentation reduces duplication and ensures consistency across the repository.

Also applies to: 31-32

docs-devel/deployment/prerequisites.md (1)

78-85: LGTM!

The updated networking prerequisites are clear and well-structured. The simplified IP allocation instructions and DNS alignment requirement properly reflect the streamlined deployment workflow.

docs-devel/data-architecture/catalogs.md (1)

1-9: LGTM!

The catalogs documentation is clear and well-structured. The use of backticks for technical terms like accelerator, src_, and analytics_ is appropriate.

docs-devel/data-architecture/index.md (1)

1-35: LGTM!

The data architecture documentation provides a clear and comprehensive overview of the Medallion-style lakehouse implementation. The explanations of bronze, silver, and gold layers are well-structured and include appropriate references to external resources and internal documentation.

docs-devel/index.md (1)

1-39: LGTM!

The index page provides an excellent entry point to the developer documentation. The repository overview with the directory tree is particularly helpful for understanding the monorepo structure. The links to detailed architecture sections are well-organised.

docs-devel/images/analytics-data-platform-arch.drawio (1)

1-155: LGTM!

The architecture diagram provides a valuable visual representation of the Analytics Data Platform components and their interactions. This complements the written system architecture documentation effectively.

Use the same tool as VSCode for ease of configuring the same
rules for both the editor and CI tooling
Use correct extension in docs
@martyngigg martyngigg force-pushed the developer-documentation branch from 0874624 to 4970aec Compare December 18, 2025 16:24
@martyngigg martyngigg merged commit 8112d6e into main Dec 18, 2025
4 checks passed
@martyngigg martyngigg deleted the developer-documentation branch December 18, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants