Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14596

[Python] parquet.read_table nested fields in columns does not work for use_legacy_dataset=False

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Critical
    • Resolution: Unresolved
    • None
    • 11.0.0
    • Python

    Description

      Reading nested field does not work with use_legacy_dataset=False.

      This works:

       

      import pyarrow.parquet as pq
      t = pq.read_table(
       source=*filename*,
       columns=['store_key', 'properties.country'], 
       use_legacy_dataset=True,
      ).to_pandas()
      

      This does not work (for the same parquet file):

       

      import pyarrow.parquet as pq
      
      t = pq.read_table(
       source=*filename*,
       columns=['store_key', 'properties.country'], 
       use_legacy_dataset=False,
      ).to_pandas()

       

      Attachments

        Issue Links

          Activity

            People

              milesgranger Miles Granger
              TomScheffers Tom Scheffers
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 10m
                  3h 10m