Improve plugin documentation (second batch) #987
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Silk — Improve plugin documentation (second batch)
https://jira.eccenca.com/browse/CMEM-7013
This PR adds documentation for the following Silk dataset plugins:
<Cell>-level structure and optional relation/measure).RdfFileDataset.md
RDF file reads RDF data from a local file (or ZIP archive) into the project as an in-memory dataset and, for supported formats, can also write RDF back to a file.
The doc starts with the intended usage window (small/medium files, snapshots for exploration/mapping/linking, simple export) and immediately flags the hard constraint: everything is loaded in memory, so very large files belong in an external store. Then it walks the data shape and IO story: single file vs ZIP input (plus the regex gate for which ZIP entries are considered), dataset output as queryable graph(s), and the graph-selection rule (named graph only where the chosen format supports it; otherwise default graph, with the graph parameter ignored for graph-less formats). Configuration notes focus on how to think, not just what to fill: file/ZIP behavior, format auto-detection (and the “can’t detect → error” path), the write restriction (only N-Triples as output), advanced narrowing via an entity list, and ZIP file filtering via regex. Behavior is described as a sequence you can predict: size check → parse into an in-memory dataset (default + possibly named graphs) → select graph → serve repeated reads from memory until the underlying file timestamp changes → reload on next access → write path serializes as N-Triples only. It ends with limitations + “when to use” guidance and concrete examples (simple Turtle, N-Quads with an explicit graph, ZIP with multiple RDF files).
InMemoryDataset.md
In-memory dataset is a small embedded RDF store that keeps all data in memory and exposes it via SPARQL as a temporary working graph inside workflows.
The doc frames it as a deliberately non-persistent scratch graph: one in-memory RDF model, all reads and writes mediated through a SPARQL endpoint, and an empty state after application restart. Within workflows it’s explicitly bidirectional—usable as both source and sink—so upstream components can write entities/links/triples into it and downstream components query it like a normal SPARQL dataset (entity retrieval, path/type discovery, sampling, etc.), with no file backing at all. Writing is explained by sink type but unified in effect: entity sink converts entities to triples, link sink writes link triples, triple sink adds triples directly; all converge into the same single in-memory graph. The one configuration knob (“Clear graph before workflow execution”, default true) is treated as the semantic switch: either a fresh empty graph per run, or a longer-lived in-memory graph across runs within the same process. Limitations are stated as operational consequences (memory-bound, no persistence, best for small/medium intermediates and prototyping) and the examples reinforce the intended patterns: temporary integration graph, scratch experimentation area, small lookup store.
AlignmentDataset.md
Alignment is a write-only dataset that exports link results as Alignment files following the AlignAPI format specification (and the SWJ60 description).
The doc keeps scope tight from the start: it exists to serialize links between entities in a standardized alignment format, not to read entities, run transformations, or do extra processing. It motivates the shape via separation of concerns and interoperability: a focused exporter that produces files consumable by alignment-aware tooling and usable in subsequent workflows. The core mechanics are explained at the link-record level—each link becomes one
<Cell>with explicit source URI, target URI, optional relation (e.g.,=), and an optional confidence measure (0.0–1.0)—and the plugin is responsible for emitting a well-formed file (structure, header/footer, UTF-8). A minimal example anchors how multiple links map to multiple<Cell>entries, and the references section points to the AlignAPI format spec and the SWJ60 paper for full semantics and edge details.