Skip to content

Conversation

@martyngigg
Copy link
Contributor

@martyngigg martyngigg commented Dec 18, 2025

Summary

Refs #135

Summary by CodeRabbit

Release Notes

  • New Features

    • Added fine-grained access control infrastructure to the platform, except Trino. We will explore OPA access control in the future.
    • Enabled OAuth-based single sign-on authentication for Superset.
    • Automated local certificate generation for development environments.
  • Documentation

    • Updated local development setup instructions with required /etc/hosts configuration.
    • Expanded service endpoints and access documentation.
    • Clarified HTTPS requirement for Trino connections.
    • Reorganised testing guidance to emphasise unit tests.
  • Infrastructure

    • Configured file-based access control for Trino.
    • Enhanced Traefik routing with TLS support.
    • Streamlined authentication realm configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

Required for Trino with authorization.
Disables certificate validation
On macOS using curl any *.local hostname has a few seecond delay
in resolving that doesn't happen with any other suffix.
It's more robust and removes issues with quoting strings.
Difference is now everything is on port 50080 and only
Trino on 50443.
OIDC IDs change each time we wipe out Keycloak
The lakekeeper UI refuses to authenticate with Keycloak over
http so needs to use https. We keep http non-browser based
requirements.
Just needs trino & opa service accounts in keycloak.
@coderabbitai
Copy link

coderabbitai bot commented Dec 18, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request significantly refactors local infrastructure setup by introducing TLS certificate management via mkcert, integrating OpenFGA for access control, restructuring bootstrap flows to use a Click-based CLI with identity provider abstractions, and reorganising environment configuration. Multiple services now employ OAuth authentication via Keycloak, whilst test configurations are updated to reflect these architectural changes.

Changes

Cohort / File(s) Summary
Documentation & Setup
AGENTS.md, elt-common/README.md, readme.md, infra/local/README.md
Updated guides for local infrastructure setup, including /etc/hosts configuration for adp-router, restructured testing instructions, and expanded service access documentation.
Infrastructure & TLS
infra/local/docker-compose.yml, infra/local/keycloak/Dockerfile-kcadm.dockerfile, infra/local/traefik/traefik.yml, infra/local/traefik/traefik-dynamic.yml
Added mkcert-based certificate generation, OpenFGA services and migration, keycloak-bootstrap and lakekeeper-bootstrap services, updated Traefik routing to support HTTPS with TLS entry points and port changes.
Keycloak Authentication
infra/ansible-docker/group_vars/all/keycloak.yml, infra/local/keycloak/bootstrap.sh, infra/local/keycloak/iceberg-realm.json
Introduced keycloak_url and keycloak_realm variables for templated realm URL construction, added bootstrap script for automated realm and client setup, removed hardcoded iceberg-realm.json configuration file.
Bootstrap & Identity Provider
infra/ansible-docker/roles/lakekeeper/files/bootstrap-warehouse.py, infra/ansible-docker/roles/lakekeeper/tasks/main.yml
Refactored bootstrap from environment-driven to Click CLI with Keycloak and LakekeeperRestV1 abstractions, added credential handling and warehouse provisioning logic, replaced environment variables with CLI arguments.
E2E Testing & Configuration
elt-common/tests/e2e_tests/conftest.py, elt-common/tests/e2e_tests/elt_common/dlt_destinations/pyiceberg/utils.py, elt-common/tests/e2e_tests/elt_common/iceberg/conftest.py
Updated warehouse fixture signatures, added project_id handling to credential setup, adjusted Trino and Keycloak settings for HTTPS and analytics-data-platform realm, added retry policy and teardown enhancements.
Core Configuration Updates
elt-common/src/elt_common/dlt_destinations/pyiceberg/configuration.py, elt-common/src/elt_common/iceberg/trino.py
Added project_id field to PyIceberg credentials, updated field alias mappings to lowercase, disabled SSL certificate verification in Trino connection.
Environment Configuration
infra/local/env-for-testing, infra/local/env-local
Removed legacy environment file; introduced new env-local with comprehensive service configuration sections for Postgres, Keycloak, MinIO, OpenFGA, Iceberg, Trino, and Superset.
Trino Setup
infra/local/trino/etc/config.properties, infra/local/trino/etc/access-control.properties, infra/local/trino/etc/password-authenticator.properties, infra/local/trino/etc/catalog/playground.properties, infra/local/trino/etc/jvm.config, infra/local/trino/trino-create-catalog.sql
Added password authentication configuration, access control rules, playground iceberg REST catalog configuration, trust store JVM option, and removed dynamic SQL catalog creation.
Superset Integration
infra/local/superset/Dockerfile, infra/local/superset/docker/docker-bootstrap-cert.sh, infra/local/superset/docker/docker-init.sh, infra/local/superset/docker/pythonpath/superset_config.py, infra/local/superset/docker/requirements-local.txt
Removed legacy Dockerfile; introduced certificate bootstrap wrapper, updated init script with parameterised Trino connection, added OAuth/Keycloak security manager and role mapping, introduced Redis caching and Authlib dependency.
Warehouse Bootstrap Data
infra/local/lakekeeper/bootstrap-playground.json
Updated S3 credentials and endpoint configuration for local setup, removed project-id from top-level configuration.

Poem

🐰 A Rabbit's Ode to Infrastructure

Through mkcert's magic, TLS blooms so bright,
Bootstrap flows now Click their way to light,
OAuth dances with Keycloak's gentle hand,
OpenFGA guards this lakehouse land,
From env-for-testing's dust, new config stands tall—
A modern stack awaits us all! 🌟

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the primary change: enabling authorization backend (AUTHZ_BACKEND) in local infrastructure, which aligns with the extensive configuration changes across multiple infrastructure files including Trino access control, rules, OpenFGA service additions, and related bootstrap scripts.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
elt-common/src/elt_common/iceberg/trino.py (1)

106-114: Critical: SSL certificate verification disabled.

Setting verify: False completely disables SSL certificate verification, which defeats the purpose of using HTTPS and exposes the connection to man-in-the-middle attacks. This is a critical security vulnerability, even in local development environments.

Given that this PR introduces proper TLS infrastructure with mkcert and trust stores (as evidenced by the JVM trustStore configuration and certificate handling elsewhere), you should configure proper certificate verification instead.

🔎 Suggested approaches:

Option 1: Use the CA certificate bundle

     def _create_engine(self, credentials: TrinoCredentials) -> Engine:
         return create_engine(
             self.url,
             connect_args={
                 "auth": BasicAuthentication(credentials.user, credentials.password),
                 "http_scheme": credentials.http_scheme,
-                "verify": False,
+                "verify": "/path/to/rootCA.pem",  # Point to the mkcert CA certificate
             },
         )

Option 2: Load from environment variable

     def _create_engine(self, credentials: TrinoCredentials) -> Engine:
+        verify = os.getenv("TRINO_SSL_VERIFY", "True")
+        # Convert string to boolean or path
+        if verify.lower() in ("false", "0"):
+            verify = False
+        elif verify.lower() not in ("true", "1"):
+            # Treat as path to CA bundle
+            pass
+        else:
+            verify = True
+            
         return create_engine(
             self.url,
             connect_args={
                 "auth": BasicAuthentication(credentials.user, credentials.password),
                 "http_scheme": credentials.http_scheme,
-                "verify": False,
+                "verify": verify,
             },
         )
infra/local/superset/docker/pythonpath/superset_config.py (1)

152-155: String interpolation bug: environment variable not interpolated.

Line 154 mixes an f-string prefix with a literal {os.environ...} inside a regular string on line 155. The os.environ.get(...) call is not being evaluated.

🔎 Apply this fix:
 # The base URL for the email report hyperlinks.
 WEBDRIVER_BASEURL_USER_FRIENDLY = (
-    "https://{{ top_level_domain }}{os.environ.get('SUPERSET_APP_ROOT', '')}/"
+    f"https://{{{{ top_level_domain }}}}{os.environ.get('SUPERSET_APP_ROOT', '')}/"
 )

Note: If {{ top_level_domain }} is meant to be a Jinja2 template placeholder, you'll need to escape the braces for f-string interpolation while keeping the env var evaluation. Alternatively, use .format() or string concatenation.

🧹 Nitpick comments (13)
elt-common/tests/e2e_tests/elt_common/iceberg/conftest.py (1)

45-47: Redundant reassignment of creds.catalog.

Line 46 assigns creds.catalog = warehouse.name, but this is already set to warehouse.name on line 16 during TrinoCredentials construction. The comment suggests connecting to "the new catalog", but the value is unchanged.

If this is intentional for clarity, consider adding a comment explaining why. Otherwise, this line can be removed.

elt-common/tests/e2e_tests/conftest.py (2)

213-224: Clarify the distinction between Warehouse.project_id and server.settings.project_id.

There are two project_id attributes in play:

  • Warehouse.project_id (line 217): a uuid.UUID representing the warehouse's unique identifier
  • server.settings.project_id (line 224): a str used for credentials

Line 224 correctly uses server.settings.project_id for the credentials, but the naming similarity with the Warehouse.project_id field could cause confusion. Consider renaming Warehouse.project_id to warehouse_id to better reflect its purpose, or add a clarifying comment.


349-350: Consider replacing the sleep with a more deterministic synchronisation mechanism.

Using time.sleep(1) for synchronisation is fragile and may cause flaky tests under load or in slower environments. If the underlying issue is related to async operations completing, consider polling for a completion condition or increasing retry tolerance in the cleanup methods instead.

That said, for test teardown code with retry logic already in place, this is acceptable as a pragmatic workaround.

infra/local/superset/docker/docker-bootstrap-cert.sh (1)

9-13: Consider a more robust certificate detection approach.

Using the 5th line of the certificate bundle as a fingerprint is fragile—if the bundle format changes or has fewer lines, this could break. Also, there's a useless use of cat with grep.

🔎 Suggested improvement:
-  # Check if 5th line is in the file already
-  if ! cat "$certifi_cacert_file" | grep -q -- "$(sed -n '5p' "$SUPERSET_CA_BUNDLE")"; then
+  # Check if the bundle content is already present
+  if ! grep -qF "$(head -1 "$SUPERSET_CA_BUNDLE")" "$certifi_cacert_file"; then
     echo "Appending $SUPERSET_CA_BUNDLE to $certifi_cacert_file"
     cat "$SUPERSET_CA_BUNDLE" >> "$certifi_cacert_file"
   fi

Alternatively, use a unique marker comment in the bundle or check for the certificate's subject line.

infra/local/README.md (1)

51-58: Wrap URLs in angle brackets or Markdown links for consistency.

Static analysis flagged bare URLs. Consider wrapping them for better Markdown compatibility.

🔎 Example fix:
-- Keycloak identity provider: http://localhost:50080/auth
+- Keycloak identity provider: <http://localhost:50080/auth>
   - `master` realm credentials: _admin/admin_
   - `analytics-data-platform` realm credentials: _adpsuperuser/adppassword_
-- Lakekeeper iceberg catalog UI: http://localhost:50080/iceberg/ui.
+- Lakekeeper iceberg catalog UI: <http://localhost:50080/iceberg/ui>
   - Use the same credentials as the `analytics-data-platform` realm
-- Superset BI tool: http://localhost:50080/workspace/playground.
+- Superset BI tool: <http://localhost:50080/workspace/playground>
   - Use the same credentials as the `analytics-data-platform` realm
-- Trino endpoint: https://localhost:58443 (note the https as this is required by Trino. You may need to use an "--insecure" flag)
+- Trino endpoint: <https://localhost:58443> (note the HTTPS, as this is required by Trino. You may need to use an `--insecure` flag)
infra/local/superset/docker/pythonpath/superset_config.py (1)

78-78: Consider using os.getenv() with a default or explicit error.

Using os.environ["KEYCLOAK_REALM_INTERNAL"] will raise a KeyError if the environment variable is missing, unlike the os.getenv() pattern used elsewhere in this file. This may be intentional (fail-fast), but worth noting for consistency.

infra/local/lakekeeper/bootstrap-playground.json (1)

5-13: Credentials and endpoint updates align with the new infrastructure topology.

The changes correctly route through adp-router and use the centralised adpsuperuser credentials defined in env-local. However, as noted in env-local (lines 7-10), these values are duplicated and must be manually kept in sync.

Consider templating this file or generating it at bootstrap time to avoid drift.

infra/local/keycloak/bootstrap.sh (1)

1-3: Consider the security implications of set -x with credentials.

set -x will echo all commands to stderr, including those containing $admin_pass and other secrets. While acceptable for local debugging, this could inadvertently expose credentials in logs. Consider using set -u only, or selectively disabling tracing around sensitive commands.

🔎 Example approach to protect sensitive commands:
-set -ux
+set -u
+
+# Enable tracing selectively, disable around sensitive commands
+# set -x  # Uncomment for debugging, but be aware credentials will be logged

Alternatively, wrap sensitive commands:

set +x  # Temporarily disable tracing
$KC_ADM config credentials --server "$kc_server" ...
set -x  # Re-enable if needed
infra/ansible-docker/roles/lakekeeper/files/bootstrap-warehouse.py (5)

83-102: Redundant raise_for_status() call.

Line 92 calls response.raise_for_status() but _request_with_auth (line 86) already raises on non-success responses. This is harmless but redundant.

🔎 Apply this diff to remove redundant call:
             response = _request_with_auth(
                 requests.get,
                 self.realm_admin_url + "/users",
                 access_token,
                 params={"username": user, "exact": True},
             )
-            response.raise_for_status()
             users_json = response.json()

47-52: Consider using bare raise for re-raising exceptions.

When re-raising the same exception after logging, a bare raise preserves the original traceback more cleanly.

🔎 Apply this diff:
     try:
         response.raise_for_status()
     except requests.exceptions.HTTPError as exc:
         LOGGER.debug(f"request failed: {response.content.decode('utf-8')}")
-        raise exc
+        raise
     return response

207-216: Redundant raise_for_status() after _request_with_auth.

Similar to the oidc_ids method, line 213 is redundant as _request_with_auth already handles status checking.

🔎 Apply this diff:
         response = _request_with_auth(
             requests.post,
             url=self.management_url + "/project",
             access_token=self.access_token,
             json=payload,
         )
-        response.raise_for_status()
         project_id = response.json()["project-id"]

388-397: Duplicate permission assignment for server admins.

assign_permissions is called twice for server_admin users. The second call (lines 394-397) includes all grants from the first call plus additional project admin grants. The first call (lines 390-393) is therefore redundant.

🔎 Apply this diff to remove redundant call:
     if server_admin is not None and identity_provider is not None:
         # Server admins are human users and will be provisioned the first time they log in
-        server.assign_permissions(
-            identity_provider.oidc_ids(server_admin, server.access_token),
-            entities={"server": ["admin"]},
-        )
         server.assign_permissions(
             identity_provider.oidc_ids(server_admin, server.access_token),
             entities={"server": ["admin"], f"project/{project_id}": ["project_admin"]},
         )

237-242: Silent exception swallowing warrants a debug log.

The provision_user method silently catches HTTPError. Whilst the comment explains the rationale, adding a debug log would aid troubleshooting.

🔎 Apply this diff:
     def provision_user(self, access_token: str):
         """Provision a user through the /catalog/v1/config endpoint

         User details are retrieved from the access token
         """
         try:
             _request_with_auth(requests.get, self.catalog_url + "/config", access_token)
         except requests.exceptions.HTTPError:
             # Do not check the response status. If the user does not exist then a 400 is returned
             # but the user is provisioned internally.
-            pass
+            LOGGER.debug("User provisioned (expected 400 response for new users).")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 54ddf76 and f7dc672.

⛔ Files ignored due to path filters (1)
  • infra/local/trino/etc/password.db is excluded by !**/*.db
📒 Files selected for processing (34)
  • AGENTS.md (1 hunks)
  • elt-common/README.md (1 hunks)
  • elt-common/src/elt_common/dlt_destinations/pyiceberg/configuration.py (2 hunks)
  • elt-common/src/elt_common/iceberg/trino.py (1 hunks)
  • elt-common/tests/e2e_tests/conftest.py (9 hunks)
  • elt-common/tests/e2e_tests/elt_common/dlt_destinations/pyiceberg/utils.py (1 hunks)
  • elt-common/tests/e2e_tests/elt_common/iceberg/conftest.py (1 hunks)
  • infra/ansible-docker/group_vars/all/keycloak.yml (1 hunks)
  • infra/ansible-docker/roles/lakekeeper/files/.gitignore (1 hunks)
  • infra/ansible-docker/roles/lakekeeper/files/bootstrap-warehouse.py (3 hunks)
  • infra/ansible-docker/roles/lakekeeper/tasks/main.yml (1 hunks)
  • infra/local/README.md (1 hunks)
  • infra/local/docker-compose.yml (14 hunks)
  • infra/local/env-for-testing (0 hunks)
  • infra/local/env-local (1 hunks)
  • infra/local/keycloak/Dockerfile-kcadm.dockerfile (1 hunks)
  • infra/local/keycloak/bootstrap.sh (1 hunks)
  • infra/local/keycloak/iceberg-realm.json (0 hunks)
  • infra/local/lakekeeper/bootstrap-playground.json (1 hunks)
  • infra/local/superset/Dockerfile (0 hunks)
  • infra/local/superset/docker/docker-bootstrap-cert.sh (1 hunks)
  • infra/local/superset/docker/docker-init.sh (2 hunks)
  • infra/local/superset/docker/pythonpath/superset_config.py (4 hunks)
  • infra/local/superset/docker/requirements-local.txt (1 hunks)
  • infra/local/traefik/traefik-dynamic.yml (2 hunks)
  • infra/local/traefik/traefik.yml (2 hunks)
  • infra/local/trino/etc/access-control.properties (1 hunks)
  • infra/local/trino/etc/catalog/playground.properties (1 hunks)
  • infra/local/trino/etc/config.properties (1 hunks)
  • infra/local/trino/etc/jvm.config (1 hunks)
  • infra/local/trino/etc/password-authenticator.properties (1 hunks)
  • infra/local/trino/etc/rules.json (1 hunks)
  • infra/local/trino/trino-create-catalog.sql (0 hunks)
  • readme.md (1 hunks)
💤 Files with no reviewable changes (4)
  • infra/local/trino/trino-create-catalog.sql
  • infra/local/keycloak/iceberg-realm.json
  • infra/local/superset/Dockerfile
  • infra/local/env-for-testing
🧰 Additional context used
🧬 Code graph analysis (4)
elt-common/tests/e2e_tests/elt_common/dlt_destinations/pyiceberg/utils.py (1)
elt-common/tests/e2e_tests/conftest.py (2)
  • warehouse (318-359)
  • server (313-314)
elt-common/tests/e2e_tests/elt_common/iceberg/conftest.py (1)
elt-common/src/elt_common/iceberg/trino.py (2)
  • TrinoQueryEngine (45-114)
  • execute (59-81)
infra/ansible-docker/roles/lakekeeper/files/bootstrap-warehouse.py (1)
elt-common/tests/e2e_tests/conftest.py (5)
  • _request_with_auth (205-210)
  • access_token (298-309)
  • token_endpoint (143-144)
  • token_endpoint (291-294)
  • server (313-314)
elt-common/tests/e2e_tests/conftest.py (1)
infra/ansible-docker/roles/lakekeeper/files/bootstrap-warehouse.py (1)
  • create_warehouse (257-274)
🪛 LanguageTool
readme.md

[misspelling] ~3-~3: Use “a” instead of ‘an’ if the following word doesn’t start with a vowel sound, e.g. ‘a sentence’, ‘a university’.
Context: ...t A monorepo containing code to deploy an Data Lakehouse based on [Apache Iceberg...

(EN_A_VS_AN)

infra/local/README.md

[style] ~4-~4: Consider using “except” or “except for”
Context: ...do not use https for local development, with the exception of Trino that requires it, due to complica...

(WITH_THE_EXCEPTION_OF)


[typographical] ~20-~20: It appears that a comma is missing.
Context: ...By defining the above /etc/hosts entry a request will hit the router regardless ...

(BY_FOR_IN_COMMA)


[style] ~20-~20: This phrase is redundant. Consider using “outside”.
Context: ...r regardless of whether it is inside or outside of the container network. To test ELT scr...

(OUTSIDE_OF)


[uncategorized] ~58-~58: Possible missing comma found.
Context: ...oint: https://localhost:58443 (note the https as this is required by Trino. You may n...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)
infra/local/README.md

51-51: Bare URL used

(MD034, no-bare-urls)


52-52: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


52-52: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


53-53: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


53-53: Emphasis style
Expected: asterisk; Actual: underscore

(MD049, emphasis-style)


54-54: Bare URL used

(MD034, no-bare-urls)


56-56: Bare URL used

(MD034, no-bare-urls)


58-58: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.8)
infra/ansible-docker/roles/lakekeeper/files/bootstrap-warehouse.py

51-51: Use raise without specifying exception name

Remove exception name

(TRY201)


75-75: Probable use of requests call without timeout

(S113)


97-97: Avoid specifying long messages outside the exception class

(TRY003)


99-101: Avoid specifying long messages outside the exception class

(TRY003)


118-118: Probable use of requests call without timeout

(S113)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: elt-common end-to-end tests
🔇 Additional comments (38)
elt-common/src/elt_common/dlt_destinations/pyiceberg/configuration.py (2)

18-18: LGTM!

The new project_id field is correctly defined as optional with a None default, consistent with the existing field patterns in this class.


32-36: LGTM!

The field alias mappings are correctly updated:

  • Header keys use lowercase convention (x-iceberg-access-delegation, x-project-id) which aligns with the CATALOG_HEADER_PREFIX usage.
  • The project_id field is properly mapped to be included in the output dictionary when set.
elt-common/tests/e2e_tests/elt_common/dlt_destinations/pyiceberg/utils.py (1)

49-51: LGTM!

The PROJECT_ID environment variable is correctly sourced from warehouse.server.settings.project_id, aligning with the new project_id field added to PyIcebergRestCatalogCredentials. The direct assignment pattern (vs setdefault) is consistent with the URI setting above.

elt-common/tests/e2e_tests/elt_common/iceberg/conftest.py (2)

16-17: LGTM!

The credentials are correctly configured with the warehouse name as the catalog and HTTPS scheme, aligning with the infrastructure changes for TLS support.


25-25: LGTM!

The warehouse identifier correctly uses the {project_id}/{warehouse.name} format, which is consistent with how the project and warehouse relationship is structured in the updated configuration.

elt-common/tests/e2e_tests/conftest.py (2)

80-99: LGTM!

The updated default settings correctly reflect the new local infrastructure configuration with HTTPS for Trino, the analytics-data-platform realm, and updated credentials.


162-166: LGTM!

The create_warehouse method signature correctly accepts project_id as a string, matching the API expectation and the bootstrap-warehouse.py implementation in infra/ansible-docker/roles/lakekeeper/files/.

infra/ansible-docker/roles/lakekeeper/files/.gitignore (1)

1-1: LGTM!

The addition of *.log to the .gitignore is appropriate for excluding log files generated during bootstrap and runtime processes.

AGENTS.md (1)

60-60: LGTM!

The documentation correctly instructs developers to add the adp-router hostname mapping for local development and e2e testing, which aligns with the TLS and routing changes introduced in this PR.

infra/local/trino/etc/config.properties (3)

2-2: LGTM!

The change from memory to file-based catalog storage provides persistence, which aligns with the REST-based Iceberg catalog configuration changes in this PR.


8-8: LGTM!

The password authentication configuration correctly integrates with the file-based password authenticator defined in password-authenticator.properties.


14-14: LGTM!

Enabling web UI preview features is appropriate for local development and testing.

infra/local/traefik/traefik.yml (3)

1-1: LGTM!

The YAML schema reference improves IDE support and validation.


12-12: Log level reduced to ERROR.

The change from DEBUG to ERROR reduces log verbosity. This is appropriate for a cleaner log output in local development, but be aware that it may make troubleshooting routing issues more difficult if problems arise.


17-21: LGTM!

The entry point changes appropriately support the new TLS-enabled routing architecture:

  • The web entry point moved to a non-standard port (50080) to avoid conflicts
  • New minioapi and trinosecure entry points support TLS-enabled services

These changes align with the broader TLS infrastructure introduced in this PR.

infra/local/trino/etc/password-authenticator.properties (1)

1-2: LGTM!

The file-based password authenticator configuration is appropriate and correctly aligns with the http-server.authentication.type=password setting in config.properties.

infra/local/trino/etc/jvm.config (1)

18-18: Inconsistency with AI summary.

The AI-generated summary indicates that -XX:+EnableDynamicAgentLoading was removed, but the line is still present in the provided code at line 18. Please verify whether this option should be kept or removed.

infra/local/superset/docker/docker-bootstrap-cert.sh (1)

19-19: LGTM!

The delegation to the original bootstrap script with argument passthrough is correctly implemented.

readme.md (1)

8-10: LGTM!

The new Components section with reference to AGENTS.md provides helpful navigation for repository orientation.

infra/local/trino/etc/catalog/playground.properties (2)

7-9: Hardcoded OAuth2 credentials acceptable for local development.

The trino:s3cr3t credential is hardcoded here. This is acceptable for local development infrastructure, but ensure this pattern isn't replicated in production configurations.


11-18: LGTM!

S3 configuration correctly uses environment variables for credentials, avoiding hardcoding of secrets. The comment explaining why credentials are needed (no STS) is helpful.

infra/local/README.md (1)

23-39: LGTM!

The dlt secrets configuration example is helpful for developers setting up ELT testing. The cross-reference to env-local for value definitions is good practice.

elt-common/README.md (1)

24-45: LGTM!

The reorganised testing documentation clearly separates unit and e2e tests with appropriate setup instructions. The cross-reference to infra/local documentation for the /etc/hosts requirement is helpful.

infra/local/traefik/traefik-dynamic.yml (2)

1-15: LGTM!

TLS configuration with certificate paths and the Trino HTTPS router are correctly configured.


56-61: No syntax adjustment needed for HeaderRegexp in Traefik v3.

The HeaderRegexp matcher syntax is correct as written. HeaderRegexp matchers take two parameters in v3, and the rule HeaderRegexp(\Referer`, `/internal/traefik`)` follows the proper format.

infra/local/superset/docker/pythonpath/superset_config.py (1)

45-57: LGTM!

The oauth_user_info method correctly extracts user information from Keycloak's userinfo endpoint with appropriate error handling via raise_for_status().

infra/local/trino/etc/access-control.properties (1)

1-2: LGTM!

The file-based access control configuration is correctly set up, referencing the rules.json file that is also introduced in this PR.

infra/ansible-docker/group_vars/all/keycloak.yml (1)

5-7: Good refactoring to centralise Keycloak URL and realm configuration.

The templated approach improves maintainability. The _isis suffix in keycloak_realm_url_isis appears to be a legacy naming convention that could be simplified in a future refactor.

infra/local/env-local (1)

1-3: Good practice: Clear warning about non-production usage.

The prominent warning appropriately signals that this configuration file contains insecure defaults and should not be used in production.

infra/local/superset/docker/docker-init.sh (1)

29-38: Good improvement: Conditional admin user creation.

Making admin user creation conditional on SUPERSET_ADMIN_USER being set improves flexibility and allows for different deployment scenarios.

infra/local/keycloak/bootstrap.sh (2)

60-64: Good idempotency pattern.

The early exit when the realm already exists prevents duplicate resource creation and makes the script safe to run multiple times.


95-111: Permissive redirect URIs and service account configuration.

The redirectUris=["*"] is very permissive but acceptable for local development. The --uusername and --cclientid double-letter flags are correct Keycloak CLI syntax for targeting service accounts.

infra/local/docker-compose.yml (4)

48-67: LGTM - Certificate generation with idempotent check.

The mkcert entrypoint correctly checks for existing certificates before regenerating, ensuring idempotent behaviour. The certificate files and CA root are properly copied to the shared volume.


307-328: Hardcoded credentials acceptable for local development only.

The bootstrap credentials (machine-infra:s3cr3t, trino:s3cr3t) are hardcoded here. This is acceptable for local development infrastructure but ensure these values are never used in staging or production environments. The README or documentation should clarify this constraint.


234-260: LGTM - OpenFGA services properly configured.

The OpenFGA migration and runtime services are correctly wired with appropriate dependency ordering. The healthcheck using grpc_health_probe on port 8081 is the recommended approach for gRPC-based services.


119-137: No action required. The environment variables are correctly configured and intentionally use different naming conventions for different purposes. SUPERSET_DB_NAME is a legitimate separate variable, not an error or inconsistency.

Likely an incorrect or invalid review comment.

infra/ansible-docker/roles/lakekeeper/files/bootstrap-warehouse.py (2)

300-344: LGTM - Well-structured Click CLI with clear options.

The CLI design is clean with sensible defaults, proper type handling via CredentialsParamType, and support for multiple users via multiple=True. The separation of concerns between identity provider setup and Lakekeeper operations is well done.


1-7: Dependency versions exist and are compatible with Python 3.13.

Click 8.3.0 requires Python >=3.10, which includes Python 3.13, and requests officially supports Python 3.9+. Both versions are available and suitable for use in this script.

@martyngigg martyngigg merged commit a4f0e80 into main Dec 18, 2025
5 checks passed
@martyngigg martyngigg deleted the catalog-access-control branch December 18, 2025 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants