Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14176

[Python] Filename-based partitioning scheme

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Python
    • None

    Description

      This originates from [this SO question|https://stackoverflow.com/questions/69379083/read-a-partitioned-parquet-dataset-from-multiple-files-with-pyarrow-and-add-a-pa.]

      The idea is to have a portioning scheme that would allow to construct a primary key from the filename.

      Let's say that one is trying to read `/data-N.parquet` where `N` is an integer. That information should go in a primary key for later reference.

      This is quite similar to have the files laid-out like this : `/N/data.parquet` so I imagine this is technically feasible.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              chernals Cédric Hernalsteens
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: