Skip to content
15 changes: 15 additions & 0 deletions src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -958,6 +958,21 @@ union ColumnCryptoMetaData {
struct ColumnChunk {
/** File where column data is stored. If not set, assumed to be same file as
* metadata. This path is relative to the current file.
*
* As of December 2025, the only known use-case for this field is writing summary
* parquet files (i.e. "_metadata" files). These files consolidate footers from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a link that describes what a summary file is and what implementations support it?

This is what came back from a quick google search: https://stackoverflow.com/questions/53150801/what-is-the-parquet-summary-file

But I didn't see any mention of this in the format repository: https://github.com/search?q=repo%3Aapache%2Fparquet-format%20summary&type=code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was ever officially part of the parquet specification as far as I can tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded this section.

* multiple parquet files to allow for efficient reading of footers to avoid file
* listing costs and prune out files that do not need to be read based on statistics.
*
* These files do not appear to have ever been formally specified in the specification.
* and are potentially problematic from a correctness perspective [1].
*
* [1] https://lists.apache.org/thread/ootf2kmyg3p01b1bvplpvp4ftd1bt72d
*
* There is no other known usage of this field. Specifically, there are no known
* reference implementations that will read externally stored column data if this field is populated
* within a standard parquet file. Making use of the field for this purpose is
* not considered part of the Parquet specification.
**/
1: optional string file_path

Expand Down