-
Notifications
You must be signed in to change notification settings - Fork 14
New storage layout extension NNNN-uri-direct-storage-layout #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Should there be a note about the possibility of many to one mappings created by the idea of regex->string replacement? A warning or advice about what to do in the event of collision? Is the replacement strictly regex->string or is regex->regex-replacement intended? I ask because later there is a comment about static string replacement which sort of implies there might be something dynamic too How should multiple replacements be handled? Are the regexes applied in turn? Ie. does |
|
@alvinsw : Is this extension still of interest? There are open questions above. |
|
Yes I think the replacement is a weak point here. After discussion with @ptsefton we'll take off all the parameters including the one for string replacement. So the spec should be simpler and straightforward. |
|
We still need the layout option where we can directly map a URI to directory paths without any hashing. |
|
Thanks for replying @alvinsw . If you are still interested in moving this forward, could you push the update to reduce parameters you proposed in your Aug 9, 2023 #63 (comment). (We aren't sure whether #57 is going to move forward but anyway the comments from @neilsjefferies do not apply to your use case.) |
|
@zimeon I have updated the document and removed all parameters. Can this be approved now? |
|
Sorry, we need to have the suffix parameter back. I have updated the PR with a new commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suffix issues
- Line 14 text assumes that the suffix is
/__object__when it should be written in terms of thesuffixparameter - Why does the
suffixdefault to/__object__? -- this seems an oddly Pythonic choice. Would it be better if the default value were nothing as that seems more generally useful? - What are the limitations on the
suffixparameter? Is nothing OK (I hope so)? Does it need to start with a slash if not nothing? Is it OK to have multiple path segments (/one/two)? Should dangerous things not be allowed (/../../../../)? Must it not end with a slash? - There should be some warning to note that using the suffix as a way to tell whether one has reached an object in the layout is unsafe (e.g. id
https://example.org/a/__object__/b/would give root pathhttps_example.org/a/__object__/b/__object) - The warning about not creating nested object should be in the limitations section (not just example 2). This applies even with a suffix (see bullet above)
Procedure
- The reference for determining "is a URI" should be given. https://datatracker.ietf.org/doc/html/rfc3986#section-3 perhaps?
- What about the optional
userinfoand:portelements of theauthoritycomponent of the URI that contains thehostname? - The scheme in a
file://URI isfile(without the://or just:if a relative file URI) - Removing the
filescheme means relative file paths and non-URI strings overlap in the output object root path (e.g.file:helloandhelloboth map tohello). Does that deserve mention? - Syntax - bullet sequence is 1, 3, 4
- Point 4 -- one is appending to the final "OCFL object root path" not pathname
|
|
||
| This storage root extension describes a transparent path-based OCFL storage layout. URI and path based identifiers are mapped directly to multi-level directory path that are direct children of the OCFL storage root directory. | ||
|
|
||
| This extension assumes that the OCFL object identifier is a URI or a path name which is used directly to create nested paths under the OCFL storage root. An extra directory called `__object__` is added to the path to safely ensure that OCFL object is not nested under anoother OCFL object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: 'anoother'
|
|
||
| #### Mappings | ||
|
|
||
| NOTE: The [The Archive and Package (arcp) URI scheme](https://www.research.manchester.ac.uk/portal/files/76956641/arcp.html) (ARCP) is used in these examples. It allows URI IDs to be minted locally by an archive which can be used in Linked Data systems as URIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor suggestion: drop the duplicate 'The'
|
|
||
| NOTE: The [The Archive and Package (arcp) URI scheme](https://www.research.manchester.ac.uk/portal/files/76956641/arcp.html) (ARCP) is used in these examples. It allows URI IDs to be minted locally by an archive which can be used in Linked Data systems as URIs. | ||
|
|
||
| | Object ID | Object Root Path | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line:19 above, you mention:
Object IDs cannot include characters that are illegal in directory names (for example, slash or backslash)
However, in your examples, the Object IDs contain /.
|
|
||
| ### Example 2 | ||
|
|
||
| This example demonstrates the effect of using a custom `suffix` to change the default `/__object__` name convention as the leaf directory that contains an OCFL Object. If set to an empty string, the user must ensure that all the supplied URIs have a structure that does not allow nested objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If set to an empty string, the user must ensure that all the supplied URIs have a structure that does not allow nested objects.
This would benefit from further specification, and should likely be better suited for the 'Parameters' section.
What is the suggested behavior if a collision or nesting occurs? An invalid OCFL Storage Root would be the result if not handled.
| ## Procedure | ||
|
|
||
| The following is an outline of the steps to map an OCFL object identifier to an OCFL object root path: | ||
| 1. If the identifier is a URI, parse the URI and identify the scheme, hostname, and path, and ignore the rest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Percent-encoding (e.g., %20) in URIs may need special consideration due to filesystem compatibility problems.
| * **Name:** suffix | ||
| * **Description:** The suffix to be appended to the end of the path | ||
| * **Type:** string | ||
| * **Default:** "/__object__" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to escape the __ due to markdown turning this into bold.
e.g., __object__ vs object
We want to be able to directly map a URI into the storage layout directory.
It may be similar to #57 but simpler and just use the path of any valir URI directly.