Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6308

[Java] Support write interleaved dictionaries and batches in IPC stream

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • Java
    • None

    Description

      Per discussions in the following threads, as spec(http://arrow.apache.org/docs/format/IPC.html#streaming-format) described, as long as a record batch doesn't reference a dictionary they can be interleaved.

      https://github.com/apache/arrow/pull/4960

      https://github.com/apache/arrow/pull/5146

      Currently it’s able to parse dictionaries and batches which are interleaved via ARROW-6040,  But it’s impossible to write data in this format.

      cases below should be supported:

      i. have a record batch of one dictionary encoded column S

      1. Schema
      2. RecordBatch: S=[null, null, null, null]
      3. DictionaryBatch: ['abc', 'efg']
      4. Recordbatch: S=[0, 1, 0, 1]

      ii. have a record batch of two dictionary encoded column S1, S2

      1. Schema
      2. DictionaryBatch S1: ['ab', 'cd']
      3. RecordBatch: S1 = [0,1,0,1] S2 =[null, null, null,]
      4. DictionaryBatch S2: ['cc', 'dd']
      5. RecordBatch: S1 = [0,1,0,1] S2 =[0,1,0,1]

      This issue is used to record this problem, and should be done after a ML discuss.

      Attachments

        Activity

          People

            tianchen92 Ji Liu
            tianchen92 Ji Liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: