Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5971

About full schema evolution, alter column type do not support nest column but can alter inside struct type

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.12.2, 0.13.0
    • None
    • spark-sql

    Description

      reproduce steps:

      • create the test table:
      CREATE TABLE default.schema_evolution_test (
                                id bigint,
                                test_array array<int>,
                                test_array_struct ARRAY<STRUCT<name:STRING, age:INT>>)
                              USING hudi
                              TBLPROPERTIES (
                                type = 'cow',
                                primaryKey = 'id')
                              LOCATION 's3://trino-ci-test/schema_evolution_test' 
      • enable the full schema evolution
      SET hoodie.schema.on.read.enable=true
      • alter the nested table column type, there is exception
      SQL A: alter table default.schema_evolution_test alter column test_array_struct type ARRAY<STRUCT<name:STRING, age:FLOAT>>;
      Exception: Failed in [alter table default.schema_evolution_test alter column test_array_struct type ARRAY<STRUCT<name:STRING, age:FLOAT>>]
      java.lang.IllegalArgumentException: only support update primitive type but find nest column: test_array_struct
              at org.apache.hudi.internal.schema.action.TableChanges$ColumnUpdateChange.updateColumnType(TableChanges.java:89)
              at org.apache.spark.sql.hudi.command.AlterTableCommand.$anonfun$applyUpdateAction$1(AlterTableCommand.scala:149)
        
      • alter the inner struct type, it can execute successfully
      SQL B: alter table default.schema_evolution_test alter column test_array_struct.element.age type FLOAT;
      • Describe the table schema;
      desc default.schema_evolution_test;                              
      id                      bigint                                                            
      test_array              array<int>                                  
      test_array_struct       array<struct<name:string,age:float>>    
      • Questions:
        1. is this a bug of alter table column type for Hudi?
        2. Both SQL A and SQL B is used to change the type from array<struct<name:string,age:int>> to array<struct<name:string,age:float>>, why SQL A do not supported?
        3. SQL B is worked as expected or not? and SQL A will be supported in next release or future?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Alberic Alberic Liu

            Dates

              Created:
              Updated:

              Slack

                Issue deployment