Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1599

[C++][Parquet] Unable to read Parquet files with list inside struct

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.7.0
    • None
    • C++, Python
    • Ubuntu

    Description

      Is PyArrow currently unable to read in Parquet files with a vector as a column? For example, the schema of such a file is below:

      {{<pyarrow._parquet.ParquetSchema object at 0x7f2d42493c88>
      mbc: FLOAT
      deltae: FLOAT
      labels: FLOAT
      features.type: INT32 INT_8
      features.size: INT32
      features.indices.list.element: INT32
      features.values.list.element: DOUBLE}}

      Using either pq.read_table() or pq.ParquetDataset('/path/to/parquet').read() yields the following error: ArrowNotImplementedError: Currently only nesting with Lists is supported.

      From the error I assume that this may be implemented in further releases?

      Attachments

        Issue Links

          Activity

            People

              emkornfield@gmail.com Micah Kornfield
              JKung Jovann Kung
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: