Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1269

[C++] Scanning fails with list columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • parquet-cpp
    • None

    Description

      >>> list_arr = pa.array([[1, 2], [3, 4, 5]])
      >>> int_arr = pa.array([10, 11])
      >>> table = pa.Table.from_arrays([int_arr, list_arr], ['ints', 'lists'])
      >>> bio = io.BytesIO()
      >>> pq.write_table(table, bio)
      >>> bio.seek(0)
      0
      >>> reader = pq.ParquetReader()
      >>> reader.open(bio)
      >>> reader.scan_contents()
      Traceback (most recent call last):
        File "<ipython-input-23-58e977f6d60b>", line 1, in <module>
          reader.scan_contents()
        File "_parquet.pyx", line 753, in pyarrow._parquet.ParquetReader.scan_contents
        File "error.pxi", line 79, in pyarrow.lib.check_status
      ArrowIOError: Parquet error: Total rows among columns do not match
      

      ScanFileContents() claims it returns the "number of semantic rows" but apparently it actually counts the number of physical elements?

      Attachments

        Activity

          People

            Unassigned Unassigned
            apitrou Antoine Pitrou
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: