Skip to content

feat: add utility function (and/or data) for URL datasets if necessary #20

@tianjianjiang

Description

@tianjianjiang

A Light Discussion about Dataset Choices for URL (at least)

Besides a small subset of (m)C4, I prefer finding intersections among metadata (URL at least), promptsource, and evaluation WGs.

  • TyDi QA (primary task) is probably the only common dataset

For either one of two WGs excluding us metadata here,

  • From evaluation
    • GEM from eval WG, specifically
      • MLSum
      • WikiLingua
  • From promptsource
    • app_reviews: although not really URL/URI but basically namespace and date
    • CC-News: virtually a subset of C4
    • Probably some more

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or requestwontfixThis will not be worked on

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions