Skip to content

Conversation

@alvinsw
Copy link

@alvinsw alvinsw commented Dec 7, 2022

We want to be able to directly map a URI into the storage layout directory.
It may be similar to #57 but simpler and just use the path of any valir URI directly.

@zimeon
Copy link
Contributor

zimeon commented Dec 18, 2022

Should there be a note about the possibility of many to one mappings created by the idea of regex->string replacement? A warning or advice about what to do in the event of collision?

Is the replacement strictly regex->string or is regex->regex-replacement intended? I ask because later there is a comment about static string replacement which sort of implies there might be something dynamic too

How should multiple replacements be handled? Are the regexes applied in turn? Ie. does [['a','b'],['b','c']] applied to a produce c?

@awoods
Copy link
Member

awoods commented Jun 1, 2023

@alvinsw : Is this extension still of interest? There are open questions above.

@alvinsw
Copy link
Author

alvinsw commented Aug 9, 2023

Yes I think the replacement is a weak point here. After discussion with @ptsefton we'll take off all the parameters including the one for string replacement. So the spec should be simpler and straightforward.
I'll push an update soon.

@zimeon
Copy link
Contributor

zimeon commented Jul 18, 2024

@alvinsw & @ptsefton - Are you still interested in moving forward with this extension or should be close out this PR?

@alvinsw
Copy link
Author

alvinsw commented Aug 29, 2024

We still need the layout option where we can directly map a URI to directory paths without any hashing.
There are some overlaps or similarity with the other proposed extension NNNN-direct-clean-path-layout. However, I couldn't understand the last comment from @neilsjefferies that says it should be amended as an object extension. How can that works? If PR 57 can go through as a layout extension in some form or another then we can merge this one into that.

@zimeon
Copy link
Contributor

zimeon commented Aug 29, 2024

Thanks for replying @alvinsw . If you are still interested in moving this forward, could you push the update to reduce parameters you proposed in your Aug 9, 2023 #63 (comment).

(We aren't sure whether #57 is going to move forward but anyway the comments from @neilsjefferies do not apply to your use case.)

@alvinsw
Copy link
Author

alvinsw commented Nov 26, 2024

@zimeon I have updated the document and removed all parameters. Can this be approved now?

@alvinsw
Copy link
Author

alvinsw commented Dec 12, 2024

Sorry, we need to have the suffix parameter back. I have updated the PR with a new commit.

@alvinsw
Copy link
Author

alvinsw commented May 19, 2025

@zimeon @awoods Can we move forward with this extension please?

Copy link
Contributor

@zimeon zimeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suffix issues

  • Line 14 text assumes that the suffix is /__object__ when it should be written in terms of thesuffix parameter
  • Why does the suffix default to /__object__? -- this seems an oddly Pythonic choice. Would it be better if the default value were nothing as that seems more generally useful?
  • What are the limitations on the suffix parameter? Is nothing OK (I hope so)? Does it need to start with a slash if not nothing? Is it OK to have multiple path segments (/one/two)? Should dangerous things not be allowed (/../../../../)? Must it not end with a slash?
  • There should be some warning to note that using the suffix as a way to tell whether one has reached an object in the layout is unsafe (e.g. id https://example.org/a/__object__/b/ would give root path https_example.org/a/__object__/b/__object)
  • The warning about not creating nested object should be in the limitations section (not just example 2). This applies even with a suffix (see bullet above)

Procedure

  • The reference for determining "is a URI" should be given. https://datatracker.ietf.org/doc/html/rfc3986#section-3 perhaps?
  • What about the optional userinfo and :port elements of the authority component of the URI that contains the hostname?
  • The scheme in a file:// URI is file (without the :// or just : if a relative file URI)
  • Removing the file scheme means relative file paths and non-URI strings overlap in the output object root path (e.g. file:hello and hello both map to hello). Does that deserve mention?
  • Syntax - bullet sequence is 1, 3, 4
  • Point 4 -- one is appending to the final "OCFL object root path" not pathname


This storage root extension describes a transparent path-based OCFL storage layout. URI and path based identifiers are mapped directly to multi-level directory path that are direct children of the OCFL storage root directory.

This extension assumes that the OCFL object identifier is a URI or a path name which is used directly to create nested paths under the OCFL storage root. An extra directory called `__object__` is added to the path to safely ensure that OCFL object is not nested under anoother OCFL object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'anoother'


#### Mappings

NOTE: The [The Archive and Package (arcp) URI scheme](https://www.research.manchester.ac.uk/portal/files/76956641/arcp.html) (ARCP) is used in these examples. It allows URI IDs to be minted locally by an archive which can be used in Linked Data systems as URIs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestion: drop the duplicate 'The'


NOTE: The [The Archive and Package (arcp) URI scheme](https://www.research.manchester.ac.uk/portal/files/76956641/arcp.html) (ARCP) is used in these examples. It allows URI IDs to be minted locally by an archive which can be used in Linked Data systems as URIs.

| Object ID | Object Root Path |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line:19 above, you mention:

Object IDs cannot include characters that are illegal in directory names (for example, slash or backslash)

However, in your examples, the Object IDs contain /.


### Example 2

This example demonstrates the effect of using a custom `suffix` to change the default `/__object__` name convention as the leaf directory that contains an OCFL Object. If set to an empty string, the user must ensure that all the supplied URIs have a structure that does not allow nested objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If set to an empty string, the user must ensure that all the supplied URIs have a structure that does not allow nested objects.

This would benefit from further specification, and should likely be better suited for the 'Parameters' section.

What is the suggested behavior if a collision or nesting occurs? An invalid OCFL Storage Root would be the result if not handled.

## Procedure

The following is an outline of the steps to map an OCFL object identifier to an OCFL object root path:
1. If the identifier is a URI, parse the URI and identify the scheme, hostname, and path, and ignore the rest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Percent-encoding (e.g., %20) in URIs may need special consideration due to filesystem compatibility problems.

* **Name:** suffix
* **Description:** The suffix to be appended to the end of the path
* **Type:** string
* **Default:** "/__object__"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to escape the __ due to markdown turning this into bold.

e.g., __object__ vs object

See: https://github.com/OCFL/extensions/blob/ca1394cd9434d28e1ece0656058c9d9690e16192/docs/NNNN-uri-direct-storage-layout.md#parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants