Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8860

[C++] IPC/Feather decompression broken for nested arrays

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.0.0
    • C++

    Description

      When writing a table with a Struct typed column, this is read back with garbage values when using compression (which is the default):

      >>>  table = pa.table({'col': pa.StructArray.from_arrays([[0, 1, 2], [1, 2, 3]], names=["f1", "f2"])})
      
      # roundtrip through feather
      >>> feather.write_feather(table, "test_struct.feather")
      >>> table2 = feather.read_table("test_struct.feather")
      
      >>> table2.column("col")
      <pyarrow.lib.ChunkedArray object at 0x7f0b0c4d7728>
      [
        -- is_valid: all not null
        -- child 0 type: int64
          [
            24,
            1261641627085906436,
            1369095386551025664
          ]
        -- child 1 type: int64
          [
            24,
            1405756815161762308,
            281479842103296
          ]
      ]
      

      When not using compression, it is read back correctly:

      >>> feather.write_feather(table, "test_struct.feather", compression="uncompressed")                                                                                                                           
      >>> table2 = feather.read_table("test_struct.feather")                                                                                                                                                        
      
      >>> table2.column("col")                                                                                                                                                                                      
      <pyarrow.lib.ChunkedArray object at 0x7f0b0e466778>
      [
        -- is_valid: all not null
        -- child 0 type: int64
          [
            0,
            1,
            2
          ]
        -- child 1 type: int64
          [
            1,
            2,
            3
          ]
      ]
      

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m