Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28583

Upgrade from 2.5.8 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0, 2.5.8
    • None
    • master
    • None

    Description

      When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 2 HDFS), I met the following exception and the upgrade failed.

      2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] master.HMaster: Failed to become active master
      org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: old_table_schema
              at org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
      2024-05-10T00:54:45,937 ERROR [master/hmaster:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master hmaster,16000,1715302475720: Unhandled exception. Starting shutdown. *****
      org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: old_table_schema
              at org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
              at org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
              at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
      

      Reproduce

      This bug can be reproduced deterministically with the following steps:

      Start up HBase 2.5.8 cluster (1 HM, 2 RS, 1 HDFS: hadoop 2.10.2)

      Execute the following commands

      create 'tb1', {NAME => 'c0', VERSIONS => 1}
      snapshot 'tb1', 's1'
      disable 'tb1'
      restore_snapshot 's1' 

      Stop the 2.5.8 cluster, then start up 3.0.0 cluster (commit: 516c89e8597fb6)

      The upgrade will fail with the above exception.

      Root Cause

      This incompatibility between 2.5.8 and 3.0.0 is related to a newly added required field in proto file: old_table_schema.

      2.5.8

      hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
      
      message RestoreSnapshotStateData {
        required UserInformation user_info = 1;
        required SnapshotDescription snapshot = 2;
        required TableSchema modified_table_schema = 3;
        repeated RegionInfo region_info_for_restore = 4;
        repeated RegionInfo region_info_for_remove = 5;
        repeated RegionInfo region_info_for_add = 6;
        repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 7;
        optional bool restore_acl = 8;
      }

      3.0.0 (516c89e8597fb6)

      message RestoreSnapshotStateData {
        required UserInformation user_info = 1;
        required SnapshotDescription snapshot = 2;
        required TableSchema modified_table_schema = 3;
        repeated RegionInfo region_info_for_restore = 4;
        repeated RegionInfo region_info_for_remove = 5;
        repeated RegionInfo region_info_for_add = 6;
        repeated RestoreParentToChildRegionsPair parent_to_child_regions_pair_list = 7;
        optional bool restore_acl = 8;
        required TableSchema old_table_schema = 9;
      } 

      In certain scenarios, the proto message does not contain the old_table_schema field.

      I am wondering whether old_table_schema field must be set as required.

       

      I attached the (1) master logs file and (2) all log files in persistent.tar.gz.

      I am trying to find out the root cause. I appreciate any suggestions. Thank you!

      Attachments

        1. hbase--master-033a47be7d1d.log
          161 kB
          Ke Han
        2. persistent.tar.gz
          80 kB
          Ke Han

        Activity

          People

            Unassigned Unassigned
            kehan5800 Ke Han
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: