-
Notifications
You must be signed in to change notification settings - Fork 0
Introduce xarray-prism #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…reating issue for unknown data
antarcticrainforest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good.
A couple for general comments.
- There are some formats that aren't covered yet. It's some weird json reference files but I'd have to investigate myself.
- I think the name
freva-xarrayor enginefrevais to loaded. How about changing the name of this.
how about xarray-prism / prism, or xarray-switchboard, switchboard. Or xarray-gateway/gateway? Just some thoughts
README.md
Outdated
| > If you deal with a data that `freva` engine is not able to open that, please | ||
| > report the data [here](https://github.com/freva-org/freva-xarray/issues/new) | ||
| > to let us improve this engine to be able to be versitile and work with all | ||
| > sort of climate data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > If you deal with a data that `freva` engine is not able to open that, please | |
| > report the data [here](https://github.com/freva-org/freva-xarray/issues/new) | |
| > to let us improve this engine to be able to be versitile and work with all | |
| > sort of climate data. | |
| > If you encounter with a data formats that `freva` engine is not able to open, please | |
| > files an issue report [here](https://github.com/freva-org/freva-xarray/issues/new). | |
| > This helps us to improve the engine enabling users work with different kinds of climate data. |
src/freva_xarray/backends/cloud.py
Outdated
| extra_lines = 0 | ||
| if show_progress: | ||
| fmt = "GRIB" if engine == "cfgrib" else "NetCDF3" | ||
| print(f"[warning] Remote {fmt} requires full file download") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning.warn?
src/freva_xarray/entrypoint.py
Outdated
| ) -> xr.Dataset: | ||
| """Xarray Generic function: Open dataset with | ||
| automatic format detection.""" | ||
| if not isinstance(filename_or_obj, (str, Path)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.PathLike? This covers also fsspec stuff.
src/freva_xarray/entrypoint.py
Outdated
| lines_printed = 0 | ||
|
|
||
| if is_remote: | ||
| sys.stdout.write("[info] Detecting format...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging?
src/freva_xarray/utils.py
Outdated
| self._render() | ||
|
|
||
| def _render(self) -> None: | ||
| mb = self._current / 1024 / 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| mb = self._current / 1024 / 1024 | |
| mb = self._current / 1024**2 |
|
@mannreis could you take a look at this and advertise if needed. This seems to be really nice!!! |
mannreis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good work and Mo! I left my 2cents. My only concern is regarding caching the remote data which may not be, ever, an issue.
src/freva_xarray/utils.py
Outdated
| pct = min(self._current / self._total, 1.0) | ||
| filled = int(self.width * pct) | ||
| bar = "█" * filled + "░" * (self.width - filled) | ||
| total_mb = self._total / 1024 / 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on Martin's suggestion:
| total_mb = self._total / 1024 / 1024 | |
| total_mb = self._total / 1024**2 |
src/freva_xarray/_detection.py
Outdated
| return None | ||
|
|
||
|
|
||
| def _detect_from_magic_bytes(header: bytes, lower_path: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since typing is around is something like this an improvement? Or returning "unknown" defeats the effort?
| def _detect_from_magic_bytes(header: bytes, lower_path: str) -> str: | |
| Engine = Literal["cfgrib", "scipy", "h5netcdf", "rasterio", "unknown"] | |
| def _detect_from_magic_bytes(header: bytes, lower_path: str) -> Engine: |
I guess it would also have to be propagated up the call stack and for that reason inconvenient
|
|
||
| # GRIB: cache locally | ||
| if engine == "cfgrib": | ||
| local_path = _cache_remote_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caching is good but I'm not sure whether is better to enable it explicitly or allow to disable.
Regardless I would not do it on /tmp for these applications which I believe is the default in many systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. Regarding caching, I already though quite a bit about this, and for this purpose, we designed the FREVA_XARRAY_CACHE environment variable in place to user be able to change the cache location. Or user can also adjust it on storage_options argument of open_dataset function of xarray. Also when the user opening the remote files, progress logging appears to tell the user, it's storing data on /tmp and user sees the progress. It means users can cancel the procedure when it's downloading the data by watching the logs.
Also main purpose of designing the adjustable FREVA_XARRAY_CACHE was for our own data-loader. We wanted to adjust the cache on deployment to /scratch to don't run into storage limit.
Apart from those, we really don't have many other options for opening the grib and netcdf3 data. Because of the structure of those formats, we need to download the full data and then open it. So in short, /tmp was the safest way we could come up with. If there is any more secure suggestion, please let us know.
I first didn't mind the name but then after seeing the structure of xarray backends, I see Martin's point. Unless I misunderstood something this could even be a contribution upstream to xarray, thus freva is not involved. |
|
@antarcticrainforest thanks for the reviw. regarding this point:
could you please give a link or example of this data to dig more for making it compatible with this? |
In this PR we are going to introduce the xarray-prism as
prismengine for opening all available climate data including remote and posix data.How to install:
examples for remote data: