Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1830

Fix race condition in HdfsServiceTracker

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11.0, 0.12.0
    • Component/s: TajoMaster, Unit Test
    • Labels:
      None

      Description

      If HdfsServiceTracker reads active-master file before the file not closed, active-master file is empty.

      {format}
      testAutoFailOver(org.apache.tajo.ha.TestHAServiceHDFSImpl) Time elapsed: 4.668 sec <<< ERROR!
      org.apache.tajo.exception.TajoRuntimeException: org.apache.tajo.client.v2.exception.ClientConnectionException: java.io.EOFException
      at org.apache.tajo.client.SessionConnection.getTajoMasterConnection(SessionConnection.java:141)
      at org.apache.tajo.client.SessionConnection.<init>(SessionConnection.java:113)
      at org.apache.tajo.client.TajoClientImpl.<init>(TajoClientImpl.java:62)
      at org.apache.tajo.client.TajoClientImpl.<init>(TajoClientImpl.java:86)
      at org.apache.tajo.client.TajoClientImpl.<init>(TajoClientImpl.java:82)
      at org.apache.tajo.ha.TestHAServiceHDFSImpl.verifyDataBaseAndTable(TestHAServiceHDFSImpl.java:152)
      at org.apache.tajo.ha.TestHAServiceHDFSImpl.testAutoFailOver(TestHAServiceHDFSImpl.java:82)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
      at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
      at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
      at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
      at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
      at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
      at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
      at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
      at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
      at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
      at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      Caused by: org.apache.tajo.client.v2.exception.ClientConnectionException: java.io.EOFException
      ... 30 more
      Caused by: org.apache.tajo.service.ServiceTrackerException: java.io.EOFException
      at org.apache.tajo.ha.HdfsServiceTracker.getAddressElements(HdfsServiceTracker.java:508)
      at org.apache.tajo.ha.HdfsServiceTracker.getClientServiceAddress(HdfsServiceTracker.java:409)
      at org.apache.tajo.client.SessionConnection.getTajoMasterAddr(SessionConnection.java:361)
      at org.apache.tajo.client.SessionConnection.getTajoMasterConnection(SessionConnection.java:130)
      ... 29 more
      Caused by: java.io.EOFException
      at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
      at java.io.DataInputStream.readUTF(DataInputStream.java:589)
      at java.io.DataInputStream.readUTF(DataInputStream.java:564)
      at org.apache.tajo.ha.HdfsServiceTracker.getAddressElements(HdfsServiceTracker.java:498)
      ... 32 more
      at org.apache.tajo.ha.HdfsServiceTracker.createMasterFile(HdfsServiceTracker.java:249)
      at org.apache.tajo.ha.HdfsServiceTracker.register(HdfsServiceTracker.java:155)
      at org.apache.tajo.ha.HdfsServiceTracker$PingChecker.run(HdfsServiceTracker.java:374)
      at java.lang.Thread.run(Thread.java:745){format}

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user jinossy opened a pull request:

        https://github.com/apache/tajo/pull/748

        TAJO-1830: Fix race condition in HdfsServiceTracker.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/jinossy/tajo TAJO-1830

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/748.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #748


        commit 02102c5506caddedc7b0637f2394a4585afbb546
        Author: Jinho Kim <jhkim@apache.org>
        Date: 2015-09-09T02:48:44Z

        TAJO-1830: Fix race condition in HdfsServiceTracker.


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user jinossy opened a pull request: https://github.com/apache/tajo/pull/748 TAJO-1830 : Fix race condition in HdfsServiceTracker. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinossy/tajo TAJO-1830 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/748.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #748 commit 02102c5506caddedc7b0637f2394a4585afbb546 Author: Jinho Kim <jhkim@apache.org> Date: 2015-09-09T02:48:44Z TAJO-1830 : Fix race condition in HdfsServiceTracker.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/748#discussion_r39004423

        — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java —
        @@ -458,21 +454,37 @@ public InetSocketAddress getMasterHttpInfo() throws ServiceTrackerException

        { throw new ServiceTrackerException("Active master base path must be a directory."); }
        • FileStatus[] files = fs.listStatus(activeMasterBaseDir);
          /* wait for active master from HDFS */
          int pause = conf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_PAUSE_TIME);
          int maxRetry = conf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_MAX_NUM);
          int retry = 0;
        • while (files.length < 2 && retry < maxRetry) {
          + FileStatus[] files = fs.listStatus(activeMasterBaseDir);
          + Path activeMasterEntry = null;
          +
          + loop:while (retry < maxRetry) {
          +
          + for (FileStatus eachFile : files)
          Unknown macro: { + //check if active file is written + if (!eachFile.getPath().getName().equals(HAConstants.ACTIVE_LOCK_FILE) && eachFile.getLen() > 0) { + activeMasterEntry = eachFile.getPath(); + break loop; + } + }

          +
          try

          { this.wait(pause); }

          catch (InterruptedException e)

          { throw new ServiceTrackerException(e); }

          +
          files = fs.listStatus(activeMasterBaseDir);
          }

        + if (activeMasterEntry == null) {
        + throw new ServiceTrackerException("No such active master in : " + activeMasterBaseDir);
        — End diff –

        It would be better if the exception message is changed to 'Active master entry cannot be founded in '.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/748#discussion_r39004423 — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java — @@ -458,21 +454,37 @@ public InetSocketAddress getMasterHttpInfo() throws ServiceTrackerException { throw new ServiceTrackerException("Active master base path must be a directory."); } FileStatus[] files = fs.listStatus(activeMasterBaseDir); /* wait for active master from HDFS */ int pause = conf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_PAUSE_TIME); int maxRetry = conf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_MAX_NUM); int retry = 0; while (files.length < 2 && retry < maxRetry) { + FileStatus[] files = fs.listStatus(activeMasterBaseDir); + Path activeMasterEntry = null; + + loop:while (retry < maxRetry) { + + for (FileStatus eachFile : files) Unknown macro: { + //check if active file is written + if (!eachFile.getPath().getName().equals(HAConstants.ACTIVE_LOCK_FILE) && eachFile.getLen() > 0) { + activeMasterEntry = eachFile.getPath(); + break loop; + } + } + try { this.wait(pause); } catch (InterruptedException e) { throw new ServiceTrackerException(e); } + files = fs.listStatus(activeMasterBaseDir); } + if (activeMasterEntry == null) { + throw new ServiceTrackerException("No such active master in : " + activeMasterBaseDir); — End diff – It would be better if the exception message is changed to 'Active master entry cannot be founded in '.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/748#discussion_r39006028

        — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java —
        @@ -458,21 +454,37 @@ public InetSocketAddress getMasterHttpInfo() throws ServiceTrackerException

        { throw new ServiceTrackerException("Active master base path must be a directory."); }
        • FileStatus[] files = fs.listStatus(activeMasterBaseDir);
          /* wait for active master from HDFS */
          int pause = conf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_PAUSE_TIME);
          int maxRetry = conf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_MAX_NUM);
          int retry = 0;
        • while (files.length < 2 && retry < maxRetry) {
          + FileStatus[] files = fs.listStatus(activeMasterBaseDir);
            • End diff –

        It would be better if it is moved into the inner loop. If so, it will handle the case where the master entry is created during retrying.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/748#discussion_r39006028 — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java — @@ -458,21 +454,37 @@ public InetSocketAddress getMasterHttpInfo() throws ServiceTrackerException { throw new ServiceTrackerException("Active master base path must be a directory."); } FileStatus[] files = fs.listStatus(activeMasterBaseDir); /* wait for active master from HDFS */ int pause = conf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_PAUSE_TIME); int maxRetry = conf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_MAX_NUM); int retry = 0; while (files.length < 2 && retry < maxRetry) { + FileStatus[] files = fs.listStatus(activeMasterBaseDir); End diff – It would be better if it is moved into the inner loop. If so, it will handle the case where the master entry is created during retrying.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/748#issuecomment-138789635

        Thanks for the review
        I've update the patch that reflects your comments

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/748#issuecomment-138789635 Thanks for the review I've update the patch that reflects your comments
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/748#issuecomment-138832282

        Could you trigger it?

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/748#issuecomment-138832282 Could you trigger it?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/748#issuecomment-138886350

        triggered it

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/748#issuecomment-138886350 triggered it
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/748#issuecomment-139237793

        +1

        The patch looks good to me.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/748#issuecomment-139237793 +1 The patch looks good to me.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/748

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/748
        Hide
        jhkim Jinho Kim added a comment -

        committed it
        Thanks

        Show
        jhkim Jinho Kim added a comment - committed it Thanks
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #860 (See https://builds.apache.org/job/Tajo-master-build/860/)
        TAJO-1830: Fix race condition in HdfsServiceTracker. (jhkim: rev cea832acaf36398ddf32392d7b13493911ba6014)

        • CHANGES
        • tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java
        • tajo-core-tests/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #860 (See https://builds.apache.org/job/Tajo-master-build/860/ ) TAJO-1830 : Fix race condition in HdfsServiceTracker. (jhkim: rev cea832acaf36398ddf32392d7b13493911ba6014) CHANGES tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java tajo-core-tests/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #502 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/502/)
        TAJO-1830: Fix race condition in HdfsServiceTracker. (jhkim: rev cea832acaf36398ddf32392d7b13493911ba6014)

        • CHANGES
        • tajo-core-tests/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java
        • tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #502 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/502/ ) TAJO-1830 : Fix race condition in HdfsServiceTracker. (jhkim: rev cea832acaf36398ddf32392d7b13493911ba6014) CHANGES tajo-core-tests/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-0.11.0-build #37 (See https://builds.apache.org/job/Tajo-0.11.0-build/37/)
        TAJO-1830: Fix race condition in HdfsServiceTracker. (jhkim: rev 26400bd2fb27211d25c4ec515d9a35b1a2e9deb8)

        • CHANGES
        • tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java
        • tajo-core-tests/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-0.11.0-build #37 (See https://builds.apache.org/job/Tajo-0.11.0-build/37/ ) TAJO-1830 : Fix race condition in HdfsServiceTracker. (jhkim: rev 26400bd2fb27211d25c4ec515d9a35b1a2e9deb8) CHANGES tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java tajo-core-tests/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java

          People

          • Assignee:
            jhkim Jinho Kim
            Reporter:
            jhkim Jinho Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development