Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1599

[C++][Parquet] Unable to read Parquet files with list inside struct

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.14.0
    • Component/s: C++, Python
    • Labels:
    • Environment:
      Ubuntu

      Description

      Is PyArrow currently unable to read in Parquet files with a vector as a column? For example, the schema of such a file is below:

      {{<pyarrow._parquet.ParquetSchema object at 0x7f2d42493c88>
      mbc: FLOAT
      deltae: FLOAT
      labels: FLOAT
      features.type: INT32 INT_8
      features.size: INT32
      features.indices.list.element: INT32
      features.values.list.element: DOUBLE}}

      Using either pq.read_table() or pq.ParquetDataset('/path/to/parquet').read() yields the following error: ArrowNotImplementedError: Currently only nesting with Lists is supported.

      From the error I assume that this may be implemented in further releases?

        Attachments

          Activity

            People

            • Assignee:
              joshuastorck Joshua Storck
              Reporter:
              JKung Jovann Kung
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: