Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5030

[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.12.0
    • None
    • C++, Python

    Description

      Hey, I'm trying to concatenate two files and to avoid reading everything to memory at once, I wanted to use `read_row_group` for my solution, but it fails.

       

      I think it's due to fields like these:

      pyarrow.Field<to: list<item: string>>

       

      But I'm not sure. Is this a duplicate? The issue linked in the code is resolved https://github.com/apache/arrow/blob/fd0b90a7f7e65fde32af04c4746004a1240914cf/cpp/src/parquet/arrow/reader.cc#L915

       

      Stacktrace is

       

        File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, in read_batches
          table = pf.read_row_group(ix, columns=self._columns)
        File "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py", line 186, in read_row_group
          use_threads=use_threads)
        File "pyarrow/_parquet.pyx", line 695, in pyarrow._parquet.ParquetReader.read_row_group
        File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
      pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              farnoy Jakub Okoński
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: