diff --git a/docs/commands/build.md b/docs/commands/build.md index a85ae2b..a20c2e7 100644 --- a/docs/commands/build.md +++ b/docs/commands/build.md @@ -104,7 +104,7 @@ The HTML visualization provides an interactive graph that can be viewed in a web ### `release` -Build a release RO-Crate in a directory, scanning for and linking existing sub-RO-Crates. This creates a parent RO-Crate that references and contextualizes the sub-crates. +Build a release RO-Crate in a directory, scanning for and linking existing sub-RO-Crates. This creates a parent RO-Crate that references and contextualizes the sub-crates. For more details see [workflow documentation](release_creation.md) ```bash fairscape-cli build release [OPTIONS] RELEASE_DIRECTORY diff --git a/docs/commands/release_creation.md b/docs/commands/release_creation.md new file mode 100644 index 0000000..f7bc45a --- /dev/null +++ b/docs/commands/release_creation.md @@ -0,0 +1,44 @@ +# How to build a FAIRSCAPE Release + +The process is a little complicated I hope to explain its current state and come up with a better future solution without just adding more commands. + +## Overview + +The `build release` command can operate in two modes: + +1. **Full pre-processing** (default): Processes all subcrates, then creates the release crate. +2. **Skip pre-processing** (`--skip-subcrate-processing`): Creates the release crate without processing subcrates. The sub-crates would need to be later linked to the release crate and processed. + +## What does processing sub-crates mean? + +Processing function is located in `src/fairscape_cli/utils/build_utils.py`, this function performs four steps on each subcrate found in the release directory: + +| Step | Function | Description | +| ---- | -------------------------- | ----------------------------------------------------------------------- | +| 1 | `process_link_inverses()` | Adds OWL inverse properties using the EVI | +| 2 | `process_add_io()` | Calculates and adds `EVI:inputs` and `EVI:outputs` to the root ro-crate | +| 3 | `process_evidence_graph()` | Generates provenance graph JSON and HTML | +| 4 | `process_croissant()` | Converts RO-Crate metadata to Croissant | +| 5\* | `buld_preivew()` | Builds html preview for the RO-Crate | + +## Mapping to CLI Commands + +Each processing step can be executed using CLI commands: + +| Processing Step | Equivalent CLI Command | +| -------------------------- | -------------------------------------------------------- | +| `process_link_inverses()` | `fairscape augment link-inverses ` | +| `process_add_io()` | `fairscape augment add-io ` | +| `process_evidence_graph()` | `fairscape build evidence-graph ` | +| `process_croissant()` | `fairscape build croissant ` | +| `build_preview()` | `fairscape build preview ` | + +## Why do Subcrates need to be processed + +The CLI doesn't know while creating and adding things when the subcrate is completed. So once the user is finished with the subcrate some post-processing occurs to make it "release ready". This post-processing adds missing terms IE fills in generated or generatedBy to make sure all terms are pointing in both directions. Fills in I/O information important for crates saying they were generated by other crates. Creates useful supporting documents and formats HTML Preview, Croissant, and evidence graphs. A sub-crate is valid without all this, but these are an important part of our release processing. + +## How to make release first sub-crates later better? + +- Add build preview (needed regardless). +- Add augment sub-crate that does all 4 steps so you don't need to do them individually. +- Somehow link and rebuild release? Aggregated metrics rebuilt datasheet with pointers to sub-crate diff --git a/docs/subcrate-processing-workflow.md b/docs/subcrate-processing-workflow.md new file mode 100644 index 0000000..b618455 --- /dev/null +++ b/docs/subcrate-processing-workflow.md @@ -0,0 +1,220 @@ +# Subcrate Processing Workflow + +This document describes how `process_all_subcrates` works in the `build release` command and how its steps map to individual CLI commands for flexible workflow support. + +## Overview + +The `build release` command can operate in two modes: +1. **Full processing** (default): Processes all subcrates, then creates the release crate +2. **Skip processing** (`--skip-subcrate-processing`): Creates the release crate without processing subcrates. But the sub-crates would need to be later linked and the top-level ro-crate is missing aggreagated metrics. + +## What `process_all_subcrates` Does + +Located in `src/fairscape_cli/utils/build_utils.py`, this function performs five steps on each subcrate found in the release directory: + +| Step | Function | Description | +|------|----------|-------------| +| 1 | `process_link_inverses()` | Adds OWL inverse properties using the EVI ontology | +| 2 | `process_add_io()` | Calculates and adds `EVI:inputs` and `EVI:outputs` to the root dataset | +| 3 | `process_evidence_graph()` | Generates provenance graph JSON and HTML visualization | +| 4 | `process_croissant()` | Converts RO-Crate metadata to Croissant JSON-LD format | +| 5 | `process_preview()` | Generates `ro-crate-preview.html` for browser viewing | + +## Mapping to CLI Commands + +Each processing step can be executed individually using existing CLI commands: + +| Processing Step | Equivalent CLI Command | +|-----------------|------------------------| +| `process_link_inverses()` | `fairscape augment link-inverses ` | +| `process_add_io()` | `fairscape augment add-io ` | +| `process_evidence_graph()` | `fairscape build evidence-graph ` | +| `process_croissant()` | `fairscape build croissant ` | +| `process_preview()` | `fairscape build preview ` | + +**All-in-one command:** Use `fairscape build subcrate ` to run all five steps on a single subcrate. + +## Supported Workflows + +### Workflow 1: Subcrates First, Then Release (Default) + +This is the current default behavior. Subcrates are processed automatically before the release crate is created. + +```bash +# Single command handles everything +fairscape build release ./my-release \ + --name "My Release" \ + --organization-name "My Org" \ + --project-name "My Project" \ + --description "Release description" \ + --keywords "keyword1" --keywords "keyword2" +``` + +**What happens internally:** +1. `process_all_subcrates()` finds and processes all subcrates in `./my-release` +2. Subcrate metadata is collected (authors, keywords) +3. Release RO-Crate is created with aggregated metadata +4. Subcrates are linked to the release via `LinkSubcrates()` +5. Release-level Croissant and datasheet are generated + +### Workflow 2: Release First, Then Subcrates Later + +Use this when you need to create the release crate structure first and add/process subcrates afterward. + +```bash +# Step 1: Create release crate without processing subcrates +fairscape build release ./my-release \ + --name "My Release" \ + --organization-name "My Org" \ + --project-name "My Project" \ + --description "Release description" \ + --keywords "keyword1" \ + --skip-subcrate-processing + +# Step 2: Add subcrates to the release directory +# (manually copy or create subcrate directories) + +# Step 3: Process each subcrate (all-in-one command) +fairscape build subcrate ./my-release/subcrate1 --release-directory ./my-release +fairscape build subcrate ./my-release/subcrate2 --release-directory ./my-release + +# Or process each step individually if needed: +# fairscape augment link-inverses ./my-release/subcrate1 +# fairscape augment add-io ./my-release/subcrate1 +# fairscape build evidence-graph ./my-release/subcrate1 +# fairscape build croissant ./my-release/subcrate1 +# fairscape build preview ./my-release/subcrate1 +``` + +## Potential Enhancements + +### 1. Batch Subcrate Processing Command + +A new command to process all subcrates in an existing release: + +```bash +fairscape augment subcrates +``` + +This would call `process_all_subcrates()` on an existing release, enabling: +1. Build release first with `--skip-subcrate-processing` +2. Add subcrates to the release directory +3. Run batch processing on all subcrates + +### 2. Re-link Subcrates Command + +A command to update the release's `hasPart` references after adding new subcrates: + +```bash +fairscape augment link-subcrates +``` + +This would call `LinkSubcrates()` to update the release metadata with references to any newly added subcrates. + +### 3. Combined Post-Processing Command + +A single command to both process subcrates and re-link them: + +```bash +fairscape augment finalize-release +``` + +This would: +1. Run `process_all_subcrates()` to process all subcrates +2. Run `LinkSubcrates()` to update release references +3. Regenerate release-level Croissant and datasheet + +## Command Reference + +### `augment link-inverses` + +Adds OWL inverse properties to an RO-Crate based on the EVI ontology. + +```bash +fairscape augment link-inverses [--ontology-path PATH] [--namespace URI] +``` + +**Options:** +- `--ontology-path`: Custom OWL ontology file (defaults to bundled `evi.xml`) +- `--namespace`: Primary namespace URI for property keys (defaults to EVI namespace) + +### `augment add-io` + +Calculates and adds `EVI:inputs` and `EVI:outputs` to the root dataset. + +```bash +fairscape augment add-io [--verbose] +``` + +**Inputs are:** +- All `EVI:Sample` entities +- Datasets referenced in `usedDataset` that were not generated by a computation +- Datasets referenced in `usedDataset` but not defined in the `@graph` + +**Outputs are:** +- All datasets that were not used by any computation + +### `build evidence-graph` + +Generates a provenance graph for a specific ARK identifier. + +```bash +fairscape build evidence-graph [--output-file PATH] +``` + +**Outputs:** +- `provenance-graph.json`: JSON representation of the evidence graph +- `provenance-graph.html`: Interactive HTML visualization + +### `build croissant` + +Converts an RO-Crate to Croissant JSON-LD format. + +```bash +fairscape build croissant [--output PATH] +``` + +**Output:** +- `croissant.json` (or custom path): Croissant-formatted metadata + +### `build preview` + +Generates a lightweight HTML preview for an RO-Crate. + +```bash +fairscape build preview [--published] +``` + +**Options:** +- `--published`: Indicate if the crate is published (affects link rendering) + +**Output:** +- `ro-crate-preview.html`: Browser-viewable summary of the crate + +### `build subcrate` + +Processes a single subcrate with all augmentation and build steps. This is the recommended command for processing individual subcrates. + +```bash +fairscape build subcrate [--release-directory PATH] [--published] +``` + +**Options:** +- `--release-directory`: Parent release directory (used for relative paths in evidence graphs) +- `--published`: Indicate if the crate is published + +**Steps performed:** +1. Link inverse properties (OWL ontology entailments) +2. Add `EVI:inputs` and `EVI:outputs` to the root dataset +3. Generate evidence graph (JSON + HTML visualization) +4. Generate Croissant export (JSON-LD) +5. Generate preview HTML + +**Example:** +```bash +# Process a subcrate within a release +fairscape build subcrate ./my-release/experiment-1 --release-directory ./my-release + +# Process a standalone subcrate +fairscape build subcrate ./my-subcrate +``` diff --git a/src/fairscape_cli/commands/build_commands.py b/src/fairscape_cli/commands/build_commands.py index dd3eee2..5e10ae4 100644 --- a/src/fairscape_cli/commands/build_commands.py +++ b/src/fairscape_cli/commands/build_commands.py @@ -13,7 +13,9 @@ from fairscape_cli.utils.build_utils import ( process_all_subcrates, process_croissant, - process_datasheet + process_datasheet, + process_preview, + process_subcrate ) from fairscape_cli.models import ( @@ -534,4 +536,94 @@ def build_croissant(ctx, rocrate_path, output): except Exception as e: click.echo(f"ERROR: Failed to convert RO-Crate to Croissant: {e}", err=True) traceback.print_exc() - ctx.exit(1) \ No newline at end of file + ctx.exit(1) + + +@build_group.command('preview') +@click.argument('rocrate-path', type=click.Path(exists=True, path_type=pathlib.Path)) +@click.option('--published', is_flag=True, default=False, help="Indicate if the crate is considered published (affects link rendering).") +@click.pass_context +def build_preview_command(ctx, rocrate_path: pathlib.Path, published: bool): + """ + Generate a preview HTML file (ro-crate-preview.html) for an RO-Crate. + + This creates a lightweight HTML summary of the RO-Crate that can be + viewed in a browser. Useful for quickly inspecting crate contents. + """ + if rocrate_path.is_dir(): + crate_dir = rocrate_path + elif rocrate_path.name == "ro-crate-metadata.json": + crate_dir = rocrate_path.parent + else: + click.echo(f"ERROR: Input path must be an RO-Crate directory or a ro-crate-metadata.json file.", err=True) + ctx.exit(1) + + metadata_file = crate_dir / "ro-crate-metadata.json" + if not metadata_file.exists(): + click.echo(f"ERROR: Metadata file not found: {metadata_file}", err=True) + ctx.exit(1) + + click.echo(f"Generating preview for: {crate_dir}") + + if process_preview(crate_dir, published=published): + click.echo(f"Preview generated: {crate_dir / 'ro-crate-preview.html'}") + else: + click.echo("ERROR: Failed to generate preview", err=True) + ctx.exit(1) + + +@build_group.command('subcrate') +@click.argument('subcrate-path', type=click.Path(exists=True, path_type=pathlib.Path)) +@click.option('--release-directory', type=click.Path(exists=True, path_type=pathlib.Path), default=None, + help="Parent release directory (used for relative paths in evidence graphs).") +@click.option('--published', is_flag=True, default=False, help="Indicate if the crate is considered published.") +@click.pass_context +def build_subcrate_command(ctx, subcrate_path: pathlib.Path, release_directory: Optional[pathlib.Path], published: bool): + """ + Process a subcrate with all augmentation and build steps. + + This command performs the following steps on a single subcrate: + + \b + 1. Link inverse properties (OWL ontology entailments) + 2. Add EVI:inputs and EVI:outputs to the root dataset + 3. Generate evidence graph (JSON + HTML visualization) + 4. Generate Croissant export (JSON-LD) + 5. Generate preview HTML + + Use this command to fully process a subcrate before or after adding it + to a release. This is the individual-crate equivalent of the subcrate + processing that happens during 'build release'. + """ + if subcrate_path.is_dir(): + crate_dir = subcrate_path + elif subcrate_path.name == "ro-crate-metadata.json": + crate_dir = subcrate_path.parent + else: + click.echo(f"ERROR: Input path must be an RO-Crate directory or a ro-crate-metadata.json file.", err=True) + ctx.exit(1) + + metadata_file = crate_dir / "ro-crate-metadata.json" + if not metadata_file.exists(): + click.echo(f"ERROR: Metadata file not found: {metadata_file}", err=True) + ctx.exit(1) + + click.echo(f"\n=== Processing subcrate: {crate_dir.name} ===") + + results = process_subcrate(crate_dir, release_directory=release_directory, published=published) + + # Summary + click.echo(f"\n=== Summary ===") + click.echo(f" Link inverses: {'OK' if results['link_inverses'] else 'FAILED'}") + click.echo(f" Add I/O: {'OK' if results['add_io'] else 'FAILED'}") + click.echo(f" Evidence graph: {'OK' if results['evidence_graph'] else 'SKIPPED/FAILED'}") + click.echo(f" Croissant: {'OK' if results['croissant'] else 'FAILED'}") + click.echo(f" Preview: {'OK' if results['preview'] else 'FAILED'}") + + if results['errors']: + click.echo(f"\nErrors encountered:") + for error in results['errors']: + click.echo(f" - {error}") + ctx.exit(1) + else: + click.echo(f"\nSubcrate processing completed successfully.") \ No newline at end of file diff --git a/src/fairscape_cli/entailments/find_outputs.py b/src/fairscape_cli/entailments/find_outputs.py index 35721f6..903c679 100644 --- a/src/fairscape_cli/entailments/find_outputs.py +++ b/src/fairscape_cli/entailments/find_outputs.py @@ -11,7 +11,10 @@ def extract_datasets_from_graph(graph: List[Dict]) -> List[Tuple[str, bool]]: """ datasets = [] for entity in graph: - if entity.get("@type") == "https://w3id.org/EVI#Dataset": + entity_type = entity.get("@type") + if isinstance(entity_type, list): + entity_type = entity_type[-1] + if 'Dataset' in entity_type: dataset_id = entity.get("@id") has_generated_by = bool(entity.get("generatedBy")) datasets.append((dataset_id, has_generated_by)) @@ -27,6 +30,8 @@ def extract_samples_from_graph(graph: List[Dict]) -> List[str]: samples = [] for entity in graph: entity_type = entity.get("@type") + if isinstance(entity_type, list): + entity_type = entity_type[-1] if entity_type == "https://w3id.org/EVI#Sample" or entity_type == "EVI:Sample": sample_id = entity.get("@id") if sample_id: @@ -43,6 +48,8 @@ def extract_used_datasets_from_computations(graph: List[Dict]) -> Set[str]: used_datasets = set() for entity in graph: entity_type = entity.get("@type") + if isinstance(entity_type, list): + entity_type = entity_type[-1] if entity_type == "https://w3id.org/EVI#Computation" or entity_type == "EVI:Computation": used_dataset_list = entity.get("usedDataset", []) for dataset_ref in used_dataset_list: @@ -140,11 +147,12 @@ def add_inputs_outputs_to_rocrate(rocrate_path: pathlib.Path) -> Tuple[bool, str for i, entity in enumerate(graph): entity_type = entity.get("@type") if isinstance(entity_type, list): - if "Dataset" in entity_type or "https://w3id.org/EVI#ROCrate" in entity_type: - if entity.get("@id") != "ro-crate-metadata.json": - root_dataset = entity - root_index = i - break + entity_type = entity_type[-1] + if "https://w3id.org/EVI#ROCrate" in entity_type: + if entity.get("@id") != "ro-crate-metadata.json": + root_dataset = entity + root_index = i + break elif entity_type == "Dataset": if entity.get("@id") != "ro-crate-metadata.json": root_dataset = entity diff --git a/src/fairscape_cli/utils/build_utils.py b/src/fairscape_cli/utils/build_utils.py index daad755..5a8f537 100644 --- a/src/fairscape_cli/utils/build_utils.py +++ b/src/fairscape_cli/utils/build_utils.py @@ -198,6 +198,119 @@ def process_datasheet(crate_path: Path, published: bool = False) -> bool: click.echo(f" ERROR generating datasheet for {crate_path.name}: {e}") return False +def process_preview(crate_path: Path, published: bool = False) -> bool: + """Generate ro-crate-preview.html for a single RO-Crate.""" + from fairscape_models.rocrate import ROCrateV1_2 + from fairscape_models.conversion.converter import ROCToTargetConverter + from fairscape_models.conversion.mapping.FairscapeDatasheet import PREVIEW_MAPPING_CONFIGURATION + from fairscape_cli.datasheet_builder.rocrate.section_generators import PreviewGenerator + from jinja2 import Environment, FileSystemLoader + + metadata_file = crate_path / "ro-crate-metadata.json" + output_path = crate_path / "ro-crate-preview.html" + + try: + import fairscape_cli + package_dir = Path(fairscape_cli.__file__).parent + template_dir = package_dir / 'datasheet_builder' / 'templates' + + env = Environment( + loader=FileSystemLoader(str(template_dir)), + trim_blocks=True, + lstrip_blocks=True + ) + preview_generator = PreviewGenerator(env) + + with open(metadata_file, 'r') as f: + crate_dict = json.load(f) + + crate = ROCrateV1_2.model_validate(crate_dict) + + converter = ROCToTargetConverter( + source_crate=crate, + mapping_configuration=PREVIEW_MAPPING_CONFIGURATION + ) + + preview = converter.convert() + preview_html = preview_generator.generate(preview, published) + + with open(output_path, 'w', encoding='utf-8') as f: + f.write(preview_html) + + return True + except Exception as e: + click.echo(f" ERROR generating preview for {crate_path.name}: {e}") + return False + + +def process_subcrate(subcrate_path: Path, release_directory: Optional[Path] = None, published: bool = False) -> Dict[str, Any]: + """ + Process a single subcrate with all augmentation and build steps. + + Steps: + 1. Link inverse properties (OWL ontology) + 2. Add EVI:inputs and EVI:outputs + 3. Generate evidence graph + 4. Generate Croissant export + 5. Generate preview HTML + + Returns a dict with results for each step. + """ + results = { + 'subcrate': subcrate_path.name, + 'link_inverses': False, + 'add_io': False, + 'evidence_graph': False, + 'croissant': False, + 'preview': False, + 'errors': [] + } + + click.echo(f"Processing subcrate: {subcrate_path.name}") + + # Step 1: Link inverses + click.echo(f" - Linking inverses...") + if process_link_inverses(subcrate_path): + results['link_inverses'] = True + click.echo(f" ✓ Inverses linked") + else: + results['errors'].append("Failed to link inverses") + + # Step 2: Add inputs/outputs + click.echo(f" - Adding inputs/outputs...") + if process_add_io(subcrate_path): + results['add_io'] = True + click.echo(f" ✓ Inputs/outputs added") + else: + results['errors'].append("Failed to add I/O") + + # Step 3: Evidence graph + click.echo(f" - Generating evidence graph...") + if process_evidence_graph(subcrate_path, release_directory): + results['evidence_graph'] = True + click.echo(f" ✓ Evidence graph generated") + else: + click.echo(f" - No EVI:outputs found or graph generation skipped") + + # Step 4: Croissant + click.echo(f" - Generating Croissant...") + if process_croissant(subcrate_path): + results['croissant'] = True + click.echo(f" ✓ Croissant generated") + else: + results['errors'].append("Failed to generate Croissant") + + # Step 5: Preview + click.echo(f" - Generating preview...") + if process_preview(subcrate_path, published): + results['preview'] = True + click.echo(f" ✓ Preview generated") + else: + results['errors'].append("Failed to generate preview") + + return results + + def process_all_subcrates(release_directory: Path) -> Dict[str, Any]: subcrates = find_subcrates(release_directory)