Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7402 Federation V2: Global Optimizations
  3. YARN-8337

[FederationStateStore - MySql] Deadlock In addApplicationHome

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • federation, router
    • None

    Description

      We use mysql innodb as the state store engine,in router log we found dead lock error like below:

      [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : Unable to insert the newly generated application application_1526295230627_127402
      com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
              at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown Source)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
              at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
              at com.mysql.jdbc.Util.getInstance(Util.java:408)
              at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
              at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
              at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
              at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
              at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
              at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
              at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
              at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
              at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
              at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
              at com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
              at com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
              at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
              at com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
              at org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
      

      Use "show engine innodb status;" command to find what happens 

      2018-05-21 15:41:40 7f4685870700
      *** (1) TRANSACTION:
      TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 4999
      mysql tables in use 2, locked 2
      LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
      MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 192.168.1.138 federation executing
      INSERT INTO applicationsHomeSubCluster
      (applicationId,homeSubCluster)
      (SELECT applicationId_IN, homeSubCluster_IN
      FROM applicationsHomeSubCluster
      WHERE applicationId = applicationId_IN
      HAVING COUNT(*) = 0 )
      *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 lock_mode X locks gap before rec insert intention waiting
      Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
      0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; asc application_1526295230627_1274; (total 31 bytes);
      1: len 6; hex 00000ba5f32d; asc -;;
      2: len 7; hex dd000000280110; asc ( ;;
      3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
      
      *** (2) TRANSACTION:
      TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 4999
      mysql tables in use 2, locked 2
      4 lock struct(s), heap size 1184, 2 row lock(s)
      MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 192.168.1.138 federation executing
      INSERT INTO applicationsHomeSubCluster
      (applicationId,homeSubCluster)
      (SELECT applicationId_IN, homeSubCluster_IN
      FROM applicationsHomeSubCluster
      WHERE applicationId = applicationId_IN
      HAVING COUNT(*) = 0 )
      *** (2) HOLDS THE LOCK(S):
      RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 lock mode S locks gap before rec
      Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
      0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; asc application_1526295230627_1274; (total 31 bytes);
      1: len 6; hex 00000ba5f32d; asc -;;
      2: len 7; hex dd000000280110; asc ( ;;
      3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
      
      *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 lock_mode X locks gap before rec insert intention waiting
      Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
      0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; asc application_1526295230627_1274; (total 31 bytes);
      1: len 6; hex 00000ba5f32d; asc -;;
      2: len 7; hex dd000000280110; asc ( ;;
      3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
      
      *** WE ROLL BACK TRANSACTION (2)
      

      PROCEDURE sp_addApplicationHomeSubCluster will create gap lock .

      insert into select where clause,if the applicationId does not exists in table applicationsHomeSubCluster ,it will create a gap lock。

      At this moment ,if other threads want to insert new records of the gap,deaklock may happen.

       

      To reproduce the situation , we use 3 sessions to insert different applicationId :application_1526528662010_001201,application_1526528662010_001202,application_1526528662010_001203

       

      To fix this issue,we should use insert ignore into  instead of insert into select 

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            slfan1989 Shilun Fan Assign to me
            jianchao jia Jianchao Jia

            Dates

              Created:
              Updated:

              Slack

                Issue deployment