Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9336

[Ruby] Creating RecordBatch with structs missing keys results in a malformed table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.17.1
    • 1.0.0
    • Ruby

    Description

      Using ::Arrow::RecordBatch.new(schema, data) (which uses the RecordBatchBuilder) appears to handle when a record is missing an entry for a top level column, but it doesn't handle when a record is missing an entry within a struct column. For example, I'd expect the following code to print out true for each puts, but 2 of them are false:

      require 'parquet'
      require 'arrow'
      
      schema = [
        {name: "a", type: "string"},
        {name: "b", type: "struct", fields: [
           {name: "c", type: "string"},
           {name: "d", type: "string"},
         ]
        },
      ]
      
      arrow_schema = ::Arrow::Schema.new(schema)
      
      record_batch = ::Arrow::RecordBatch.new(
        arrow_schema,
        [
          {"a" => "a", "b" => {"c" => "c",           }},
          {            "b" => {"c" => "c",           }},
          {            "b" => {            "d" => "d"}},
        ]
      )
      table = record_batch.to_table
      
      puts(table['a'][0] == 'a')
      puts(table['a'][1].nil?)
      puts(table['a'][2].nil?)
      
      puts(table['b'][0].key?('c'))
      puts(table['b'][0]['c'] == 'c')
      puts(table['b'][0].key?('d'))
      puts(table['b'][0]['d'].nil?) # False ?
      puts(!table['b'][0].key?('e'))
      
      puts(table['b'][1].key?('c'))
      puts(table['b'][1]['c'] == 'c')
      puts(table['b'][1].key?('d'))
      puts(table['b'][1]['d'].nil?)
      puts(!table['b'][1].key?('e'))
      
      puts(table['b'][2].key?('c'))
      puts(table['b'][2]['c'].nil?)
      puts(table['b'][2].key?('d'))
      puts(table['b'][2]['d'] == 'd') # False ?
      puts(!table['b'][2].key?('e'))
      

      I'd expect puts(table) to print this representation:

      	a	b
      0	a	{"c"=>"c", "d"=>nil}
      1	 	{"c"=>"c", "d"=>nil}
      2	 	{"c"=>nil, "d"=>"d"}
      

      But it prints this instead:

      	a	b
      0	a	{"c"=>"c", "d"=>"d"}
      1	 	{"c"=>"c", "d"=>nil}
      2	 	{"c"=>nil, "d"=>nil}
      

       Furthermore, trying to write that table out to a parquet file results in the following error:

      Traceback (most recent call last):
      	7: from arrow_parquet2.rb:53:in `<main>'
      	6: from /usr/local/lib/ruby/gems/2.6.0/gems/red-arrow-0.17.1/lib/arrow/block-closable.rb:25:in `open'
      	5: from arrow_parquet2.rb:54:in `block in <main>'
      	4: from /usr/local/lib/ruby/gems/2.6.0/gems/red-arrow-0.17.1/lib/arrow/block-closable.rb:25:in `open'
      	3: from arrow_parquet2.rb:56:in `block (2 levels) in <main>'
      	2: from /usr/local/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.4.3/lib/gobject-introspection/loader.rb:514:in `block in define_method'
      	1: from /usr/local/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.4.3/lib/gobject-introspection/loader.rb:600:in `invoke'
      /usr/local/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.4.3/lib/gobject-introspection/loader.rb:600:in `invoke': [parquet][arrow][file-writer][write-table]: Invalid: Column 1: In chunk 0: Invalid: Struct child array #0 has length different from struct array (2 != 3) (Arrow::Error::Invalid)
       

      Attachments

        Issue Links

          Activity

            People

              kou Kouhei Sutou
              stevenwillis Steven Willis
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m