Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1599

[C++][Parquet] Unable to read Parquet files with list inside struct

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: 1.0.0
    • Component/s: C++, Python
    • Labels:
    • Environment:
      Ubuntu

      Description

      Is PyArrow currently unable to read in Parquet files with a vector as a column? For example, the schema of such a file is below:

      {{<pyarrow._parquet.ParquetSchema object at 0x7f2d42493c88>
      mbc: FLOAT
      deltae: FLOAT
      labels: FLOAT
      features.type: INT32 INT_8
      features.size: INT32
      features.indices.list.element: INT32
      features.values.list.element: DOUBLE}}

      Using either pq.read_table() or pq.ParquetDataset('/path/to/parquet').read() yields the following error: ArrowNotImplementedError: Currently only nesting with Lists is supported.

      From the error I assume that this may be implemented in further releases?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                emkornfield@gmail.com Micah Kornfield
                Reporter:
                JKung Jovann Kung
              • Votes:
                3 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: