-
Notifications
You must be signed in to change notification settings - Fork 89
[SVCS-552] Remove unneeded API calls in folder_file_ops #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-552] Remove unneeded API calls in folder_file_ops #287
Conversation
8e9db9d to
918743f
Compare
Added construct_path and construct_empty_path to base- provider for individual providers to make paths for folder-file-ops without needing to make API calls
f1e59f8 to
1188a93
Compare
| parent_path: BitbucketPath, | ||
| metadata) -> BitbucketPath: | ||
| return parent_path.child(metadata.name, folder=metadata.is_folder) | ||
| meta_data) -> BitbucketPath: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata is one word
| """ | ||
| raise NotImplementedError | ||
|
|
||
| async def construct_path(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally you should be able to use construct_path and revalidate_path interchangeably, so it seems more logical to just modify revalidate_path so it doesn't ever make unnecessary metadata calls, even on file moves, what prevents that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly specific case where we know we have metadata available before calling revalidate_path. Often times we will not have this information available. revalidate_path is also used extensively throughout providers, so to me it does not make sense to have to overhaul revalidate_path in core and in nearly every provider to make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely convinced by this answer, for example why add this new method to Bitbucket, when Bitbucket doesn't make any unnecessary metadata calls in the first place? Providers that use the BaseProvider revalidate method will be fine because the BaseProvider doesn't make a request. Could be beneficial to just pass it as a 'no_new_call` kwarg or something.
It's a lot of work either way, but fixing the revalidate in the provider has the added benefit of ensuring unnecessary calls arn't being made in other operators where revalidate_path is being used. Also it's far fewer new tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, and if Bitbucket ever becomes more than a RO provider, its already implemented. Also it is possible if you move a folder from Bitbucket, it will at least interact with one of the new methods (if you can, im not sure if RO providers support that, just an example).
The point of this ticket was to optimize _folder_file_op. While optimizing revalidate_path does have some added benefits, it is a much much much larger task.
It feels worse to me to muck up revalidate_path than to just make a new function that can then decide what to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but Bitbucket was just an example, the same is true of S3, Dropbox, about half of our providers.
What about a middle-ground where instead of construct_path we extend path_from_metadata to all providers and use that instead? That way we don't have to write a single-purpose method and it's already taken care of in terms of most unit testing. Construct_path is often just a wrapper for path_from_metadata anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes it can be valuable to call revalidate_path over path_from_metadata. If a provider calling revalidate_path is fine in this context, then I want that provider to still be able to call revalidate_path in _folder_file_ops. Why change the call to something else when the old functionality was just fine, or could introduce unintended behavior?
I did think of just taking over path_from_metadata, however it is already used in some places, and I don't want to just redirect it to revalidate_path when it shouldn't. Or take it over and then have to evaluate what to do with cases where it is already called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at BaseProvider.path_from_metadata seems to indicate you're only going to need to write new behavior where the path is Id based and it's already been written for googledrive, so that doesn't leave many providers left to change. You shouldn't have to change existing behavior for path_from_metadata.
| """ | ||
| raise NotImplementedError | ||
|
|
||
| async def construct_path(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely convinced by this answer, for example why add this new method to Bitbucket, when Bitbucket doesn't make any unnecessary metadata calls in the first place? Providers that use the BaseProvider revalidate method will be fine because the BaseProvider doesn't make a request. Could be beneficial to just pass it as a 'no_new_call` kwarg or something.
It's a lot of work either way, but fixing the revalidate in the provider has the added benefit of ensuring unnecessary calls arn't being made in other operators where revalidate_path is being used. Also it's far fewer new tests.
| """ | ||
| raise NotImplementedError | ||
|
|
||
| async def construct_path(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at BaseProvider.path_from_metadata seems to indicate you're only going to need to write new behavior where the path is Id based and it's already been written for googledrive, so that doesn't leave many providers left to change. You shouldn't have to change existing behavior for path_from_metadata.
|
As mentioned, this PR is replaced by #311. |
Added construct_path and construct_empty_path to base-
provider for individual providers to make paths for folder-file-ops
without needing to make API calls
Ticket
https://openscience.atlassian.net/browse/SVCS-552
Purpose
the Baseprovider class
folder_file_opfunction was making calls to get metadata, and then making the same calls again when calling revalidate path. It is possible to create paths from this metadata and forgo making more API calls.Changes
Added 2 new functions to the Baseprovider.
construct_pathandconstruct_empty_path.Construct_path takes the parent path and metadata. Each provider then has its own implementation on how to get a path from this. Some will simply use it as a proxy to
revalidate_pathsince theirrevalidate_pathdoes not make API calls. If an individual providersrevalidate_pathdoes make api calls, it will instead callpath_from_metadataor something similar.construct_empty_pathreturns a math from metadata where the identifier is none( ie it doesnt actually exit at that location).Added tests to test these new functions.
construct_empty_pathis tested in core/test_provider since almost all providers use the base definition.construct_pathis tested individually in providers and is ALWAYS tested against the output ofrevalidate_pathbecause that is what it is replacing.Dataverse is the only provider that could still makes API calls here, however I am fairly certain that dataverse cannot perform
folder_file_opsso it should never call those functions.Side effects
There could be some timing issues, however this seems unlikely
QA Notes
To test, move or copy a folder containing at least one file across 2 providers.
ie move folder1 from OSFstorage to GoogleDrive. It should work and make less API calls
Figshare may not work quite correctly. It is very hard to test with the many bugs figshare has.
Deployment Notes