Skip to content

Conversation

@mo-jeff
Copy link

@mo-jeff mo-jeff commented Nov 4, 2025

Description

Adding content for 8 collections. Things to note:

  • Description.md needs adding a separate file and linking from template.json
  • No dataset.yml files included
  • item_assets is a static list based on the collection themes

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

n/a

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review
  • Changelog has been updated
  • Documentation has been updated
  • Unit tests pass locally (./scripts/test)
  • Code is linted and styled (./scripts/format)

@ghidalgo3
Copy link
Collaborator

ghidalgo3 commented Nov 4, 2025

Thanks @mo-jeff! Some thoughts:

  • It is not strictly necessary to add a long description.md, but remember that the contents of description.md are rendered to HTML in the Planetary Computer explorer. Here you have a chance to fully describe your dataset to your users in rich detail, including links (see the red box):
Screenshot 2025-11-04 at 1 09 09 PM
  • Which values? I'll eventually end up documenting them, just curious...
  • A collection's temporal extent can be null on one side, which indicates an open internal. You can reference other continuously updated collections like MODIS, Sentinel2, HLS2, or GOES for an example.
  • item_assets at the collection level is an indication to the users of this collection what the expected asset keys are of all STAC items assets in this collection. It should be a relatively small set of asset keys.
  • You might want to consider adding the datacube STAC extension too, and documenting the dimensions of all the assets in this collection.

Let us handle the dataset.yml file, that's a workflow specification that you don't have to write.

@gadomski gadomski mentioned this pull request Nov 10, 2025
9 tasks
@mo-jeff
Copy link
Author

mo-jeff commented Nov 18, 2025

@ghidalgo3 - updated this PR. To your comments:

  • I need to add richer descriptions once we've agreed them internally.
  • Asset keys have been updated, with detail similar to ECMWF datasets. More data documentation will be linked to in the description to ensure consistency across platforms.
  • Open to using the datacube extension, focused on main content of the collection at the moment.

@gadomski - most of the technical detail is here now I think. Descriptions to follow soon.

@mo-jeff
Copy link
Author

mo-jeff commented Nov 18, 2025

@mo-jeff please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree company="Met Office"

Copy link
Contributor

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a quick once-over, no major issues. We'll want to populate the item_assets with more information once that's prepared ... I'm working on that now.

"title": "Creative Commons Attribution-ShareAlike 4.0"
},
{
"rel": "cite-as",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need a href for this link.

mo-jeff and others added 2 commits November 19, 2025 15:10
…-height-level/template.json

Co-authored-by: Pete Gadomski <pete.gadomski@gmail.com>
…ght-level/template.json

Co-authored-by: Pete Gadomski <pete.gadomski@gmail.com>
"type": "application/netcdf",
"title": "Fog fraction at screen level",
"description": "Here fog means a visibility of 1000 m or lower. The reduction in visibility is caused by water droplets or minute ice crystals forming close to the surface. This quantity represents the fraction of horizontal grid square occupied by fog. An alternative interpretation is that this represents the fractional probability of fog being present at any location in the grid square.",
"unit": "1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"unit": "1",
"unit": "%",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the info for these params came from the science team/content team so I've copied and pasted it into here. This one is a fraction, so the values are percentages I don't think, the aim is to match with other documentation across all our public platforms, so I'll have to check this one

Copy link
Contributor

@gadomski gadomski Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rodger. The STAC spec strongly recommends using either UCUM or UDUNITS-2 for the unit field. If we do need to preserve the original value, I'd suggest putting the 1 into a custom met_office_determinstic:unit field, and using a more commonly-used value % for the bare unit field.

Sorry, how is the fraction represented? As a float?

"title": "Fog fraction at screen level",
"description": "Here fog means a visibility of 1000 m or lower. The reduction in visibility is caused by water droplets or minute ice crystals forming close to the surface. This quantity represents the fraction of horizontal grid square occupied by fog. An alternative interpretation is that this represents the fractional probability of fog being present at any location in the grid square.",
"unit": "1",
"height": "1.5m",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure about this one...if it's a dataset-specific value, we may want to use a custom "extension" attribute, e.g. met_office_deterministic:height. There's also other values that reference "at screen level" but don't have a height, so maybe this should just be excluded?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've queried this too, it may be better in the description.md markdown, a catch-all statement "Screen level at 1.5m height" (or something), it also duplicates what we've got in the description key here, so that may be simplified as well

"type": "application/netcdf",
"title": "Land binary mask",
"description": "Binary indicator of whether a point is considered land (value = 1) or sea (value = 0). This is useful to identify the coastline and characterise points for correct interpretation of forecast data by users.",
"unit": "1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be omitted, as its unitless.

Suggested change
"unit": "1",

mo-jeff and others added 4 commits December 4, 2025 09:50
…r-surface-level/template.json

Co-authored-by: Pete Gadomski <pete.gadomski@gmail.com>
…r-surface-level/template.json

Co-authored-by: Pete Gadomski <pete.gadomski@gmail.com>

## Providers
[Met Office](https://www.metoffice.gov.uk/)
See all datasets managed by [Met Office.](https://registry.opendata.aws/?search=managedBy:met%20office)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming the ASDI link is a placeholder; Is there an equivalent to this for MPC?

If not, do we link to our website or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess would be a link to the "group", which is a Planetary Computer specific concept of a "group of collections". E.g. https://planetarycomputer.microsoft.com/dataset/group/landsat.

@gadomski gadomski mentioned this pull request Dec 4, 2025
9 tasks
Copy link
Contributor

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the datacube extension, we'll still want item assets. Those are used both on the database side (hydration) and are a useful indicator of what assets might be available for each collection.

EDIT: I'm also not sure about the cube extension living at the item level. A more common pattern would be to put the cube extension on each item_asset.

Copy link
Contributor

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced on the value or correctness of putting 1H_timesteps in cube:dimensions. Any reason not to just keep it in the text descriptions?

@mo-jeff
Copy link
Author

mo-jeff commented Dec 12, 2025

I'm still not convinced on the value or correctness of putting 1H_timesteps in cube:dimensions. Any reason not to just keep it in the text descriptions?

I thought Temporal Dimensions might help us with this but maybe not.

Happy to keep the Timestep header and content in the collection description and call out in the description for any parameters that don't follow that pattern? For example, there is a precip accumulation that is hourly from T+0 - T+51 (rather than T+54)

@gadomski
Copy link
Contributor

gadomski commented Dec 12, 2025

Happy to keep the Timestep header and content in the collection description and call out in the description for any parameters that don't follow that pattern? For example, there is a precip accumulation that is hourly from T+0 - T+51 (rather than T+54)

Yeah, the datacube extension is intended to describe a single file. Since your forecast horizons are split into their own standalone files, it's not a good fit for the forecast horizons IMO. Still a good fit for the other dimensions.

I think a good text description will be super-helpful for users, so 👍🏼

@gadomski
Copy link
Contributor

@mo-jeff just a heads up that I'm going to be adding a few commits on top of your branch to add ingestion code, etc. I'll try my best to not stomp on any of your changes, but letting you know so you aren't surprised if new commits appear and you need to sync your branch before comitting.

@gadomski
Copy link
Contributor

@mo-jeff I've renamed each folder to match the collection id (removed the -level) so take care with any new changes so you don't end up in merge hell.

@gadomski gadomski self-requested a review December 15, 2025 21:11
@gadomski
Copy link
Contributor

@mo-jeff updated the template.json w/ collection-level datacube extension and a few other tweaks. scripts/cleanup.py is the automated process, so if we need to do it again or change anything, should be straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants