Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10435

[C++][Dataset][Python] Improve ParquetFileFragment serialization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0
    • None
    • C++

    Description

      After ARROW-10131 ParquetFileFragment wraps a FileMetaData, from which all its properties are queried. FileMetaData is emminently serializable, so when pickling a fragment with pre-loaded metadata it would save redundant IO to just serialize the metadata. (An unpickled fragment would then also have pre-loaded metadata.)

      https://github.com/apache/arrow/pull/8507#discussion_r512698380

      Attachments

        Activity

          People

            Unassigned Unassigned
            bkietz Ben Kietzman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: