Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7402 Federation V2: Global Optimizations
  3. YARN-8337

[FederationStateStore - MySql] Improve ApplicationHome Mysql Script.

    XMLWordPrintableJSON

Details

    Description

      We use mysql innodb as the state store engine,in router log we found dead lock error like below:

      [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : Unable to insert the newly generated application application_1526295230627_127402
      com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
              at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown Source)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
              at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
              at com.mysql.jdbc.Util.getInstance(Util.java:408)
              at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
              at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
              at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
              at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
              at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
              at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
              at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
              at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
              at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
              at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
              at com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
              at com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
              at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
              at com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
              at org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
      

      Use "show engine innodb status;" command to find what happens 

      2018-05-21 15:41:40 7f4685870700
      *** (1) TRANSACTION:
      TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 4999
      mysql tables in use 2, locked 2
      LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
      MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 192.168.1.138 federation executing
      INSERT INTO applicationsHomeSubCluster
      (applicationId,homeSubCluster)
      (SELECT applicationId_IN, homeSubCluster_IN
      FROM applicationsHomeSubCluster
      WHERE applicationId = applicationId_IN
      HAVING COUNT(*) = 0 )
      *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 lock_mode X locks gap before rec insert intention waiting
      Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
      0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; asc application_1526295230627_1274; (total 31 bytes);
      1: len 6; hex 00000ba5f32d; asc -;;
      2: len 7; hex dd000000280110; asc ( ;;
      3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
      
      *** (2) TRANSACTION:
      TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 4999
      mysql tables in use 2, locked 2
      4 lock struct(s), heap size 1184, 2 row lock(s)
      MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 192.168.1.138 federation executing
      INSERT INTO applicationsHomeSubCluster
      (applicationId,homeSubCluster)
      (SELECT applicationId_IN, homeSubCluster_IN
      FROM applicationsHomeSubCluster
      WHERE applicationId = applicationId_IN
      HAVING COUNT(*) = 0 )
      *** (2) HOLDS THE LOCK(S):
      RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 lock mode S locks gap before rec
      Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
      0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; asc application_1526295230627_1274; (total 31 bytes);
      1: len 6; hex 00000ba5f32d; asc -;;
      2: len 7; hex dd000000280110; asc ( ;;
      3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
      
      *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 lock_mode X locks gap before rec insert intention waiting
      Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
      0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; asc application_1526295230627_1274; (total 31 bytes);
      1: len 6; hex 00000ba5f32d; asc -;;
      2: len 7; hex dd000000280110; asc ( ;;
      3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
      
      *** WE ROLL BACK TRANSACTION (2)
      

      PROCEDURE sp_addApplicationHomeSubCluster will create gap lock .

      insert into select where clause,if the applicationId does not exists in table applicationsHomeSubCluster ,it will create a gap lock。

      At this moment ,if other threads want to insert new records of the gap,deaklock may happen.

       

      To reproduce the situation , we use 3 sessions to insert different applicationId :application_1526528662010_001201,application_1526528662010_001202,application_1526528662010_001203

       

      To fix this issue,we should use insert ignore into  instead of insert into select 

       

      Attachments

        1. YARN-8337.001.patch
          1 kB
          Jianchao Jia
        2. YARN-8337.002.patch
          4 kB
          Jianchao Jia

        Issue Links

          Activity

            People

              slfan1989 Shilun Fan
              jianchao jia Jianchao Jia
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: