Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7930

Flink Support for Array of Row and Map of Row value

    XMLWordPrintableJSON

Details

    • 8

    Description

      I have run into an issue with tables that have an array of rows in Flink. I am able to write data, but after compaction reads produce this exception.

      java.lang.RuntimeException: Unsupported type in the list: optional binary item1 (STRING)

      The error only occurs after a compaction happens and produces parquet files. I'm using Hudi 0.14.1 and Flink 1.17.2 writing to Azure ADLS. I have tried 'Merge on Read' and 'Copy on Right' tables.

      Steps to reproduce the error.
      1. Create a table with an array of rows

      CREATE temporary TABLE TestTable (
      rowId STRING NOT NULL,
      myArray ARRAY< ROW< item1 STRING, item2 STRING > >
      ) WITH (
      'connector' = 'hudi',
      'path' = 'abfs://<container>@<storage_account>.dfs.core.windows.net/hudi/testtable',
      'table.type' = 'MERGE_ON_READ',
      'write.batch.size' = '1',
      'hoodie.compact.inline' = 'true',
      'hoodie.compact.inline.max.delta.commits' = '1',
      'compaction.async.enabled' = 'false',
      'compaction.delta_commits' = '1',
      'hoodie.datasource.write.recordkey.field' = 'rowId'
      );

      2. Insert some data
      insert into TestTable values
      ('1', ARRAY[ROW('1.item1', '1.item2')]),
      ('2', ARRAY[ROW('2.item1', '2.item2')]),
      ('3', ARRAY[ROW('3.item1', '3.item2')]),
      ('4', ARRAY[ROW('4.item1', '4.item2')]),
      ('5', ARRAY[ROW('5.item1', '5.item2')]),
      ('6', ARRAY[ROW('6.item1', '6.item2')]),
      ('7', ARRAY[ROW('7.item1', '7.item2')]),
      ('8', ARRAY[ROW('8.item1', '8.item2')]),
      ('9', ARRAY[ROW('9.item1', '9.item2')]),
      ('10', ARRAY[ROW('10.item1', '10.item2')])
      ;

      3. Query
      Select * from TestTable;

      Attachments

        Activity

          People

            danny0405 Danny Chen
            david.perkins David Perkins
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified