Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12107

long running apps may have a huge number of StatisticsData instances under FileSystem

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: 2.8.0, 2.7.3, 2.6.4, 3.0.0-alpha1
    • Component/s: fs
    • Labels:
      None

      Description

      We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from FileSystem$Statistics$StatisticsData (in the allData list of Statistics).

      Although the thread reference from StatisticsData is a weak reference, and thus can get cleared once a thread goes away, the actual StatisticsData instances in the list won't get cleared until any of these following methods is called on Statistics:

      • getBytesRead()
      • getBytesWritten()
      • getReadOps()
      • getLargeReadOps()
      • getWriteOps()
      • toString()

      It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the Statistics. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly.

      The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with FileSystem$Statistics itself in that the memory is controlled only as a side effect of those operations.

      1. HADOOP-12107.001.patch
        7 kB
        Sangjin Lee
      2. HADOOP-12107.002.patch
        11 kB
        Sangjin Lee
      3. HADOOP-12107.003.patch
        11 kB
        Sangjin Lee
      4. HADOOP-12107.004.patch
        11 kB
        Sangjin Lee
      5. HADOOP-12107.005.patch
        11 kB
        Sangjin Lee

        Issue Links

          Activity

          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.
          Hide
          jlowe Jason Lowe added a comment -

          This appears to be triggering OOM errors in Tez during task transitions in container reuse for some scenarios, see HADOOP-12958 for details. We had at least one job that would fail with an OOM every time during container reuse, and it passes once this change was reverted.

          Show
          jlowe Jason Lowe added a comment - This appears to be triggering OOM errors in Tez during task transitions in container reuse for some scenarios, see HADOOP-12958 for details. We had at least one job that would fail with an OOM every time during container reuse, and it passes once this change was reverted.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9116 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9116/)
          Update CHANGES.txt for commit of HADOOP-12107 to branch-2.7 and (jlowe: rev 651c23e8ef8aeafd999249ce57b31e689bd2ece6)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9116 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9116/ ) Update CHANGES.txt for commit of HADOOP-12107 to branch-2.7 and (jlowe: rev 651c23e8ef8aeafd999249ce57b31e689bd2ece6) hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          jlowe Jason Lowe added a comment -

          I committed this to branch-2.7 and branch-2.6.

          Show
          jlowe Jason Lowe added a comment - I committed this to branch-2.7 and branch-2.6.
          Hide
          jlowe Jason Lowe added a comment -

          I tried pulling this into branch-2.7. It comes in cleanly except for CHANGES.txt, but the unit test failed. Subsequent runs showed it was flaky, sometimes passing and sometimes not. Looking around I noticed a number of precommit builds have been complaining about this test. I can reproduce the flaky test on trunk, so I filed HADOOP-12706. I'd really like this fix backported, but I'd also rather not add another flaky test to 2.7 and 2.6.

          Sangjin Lee or Ming Ma could you take a look into the unit test?

          Show
          jlowe Jason Lowe added a comment - I tried pulling this into branch-2.7. It comes in cleanly except for CHANGES.txt, but the unit test failed. Subsequent runs showed it was flaky, sometimes passing and sometimes not. Looking around I noticed a number of precommit builds have been complaining about this test. I can reproduce the flaky test on trunk, so I filed HADOOP-12706 . I'd really like this fix backported, but I'd also rather not add another flaky test to 2.7 and 2.6. Sangjin Lee or Ming Ma could you take a look into the unit test?
          Hide
          sjlee0 Sangjin Lee added a comment -

          +1

          Show
          sjlee0 Sangjin Lee added a comment - +1
          Hide
          jlowe Jason Lowe added a comment -

          I recently ran across this on a NodeManager running 2.6 that had been up for a while. Any objections to this being picked back to 2.6 and 2.7?

          Show
          jlowe Jason Lowe added a comment - I recently ran across this on a NodeManager running 2.6 that had been up for a while. Any objections to this being picked back to 2.6 and 2.7?
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #233 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/233/)
          HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207)

          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #233 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/233/ ) HADOOP-12107 . long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2172 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2172/)
          HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207)

          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2172 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2172/ ) HADOOP-12107 . long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks, guys.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks, guys.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2190 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2190/)
          HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207)

          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2190 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2190/ ) HADOOP-12107 . long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #242 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/242/)
          HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #242 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/242/ ) HADOOP-12107 . long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java
          Hide
          sjlee0 Sangjin Lee added a comment -

          Thanks Ming Ma for the commit! Many thanks to Gera Shegalov and Colin P. McCabe for the invaluable review.

          Show
          sjlee0 Sangjin Lee added a comment - Thanks Ming Ma for the commit! Many thanks to Gera Shegalov and Colin P. McCabe for the invaluable review.
          Hide
          mingma Ming Ma added a comment -

          Also thanks for Walter Su and Sandy Ryza for the review and suggestion.

          Show
          mingma Ming Ma added a comment - Also thanks for Walter Su and Sandy Ryza for the review and suggestion.
          Hide
          mingma Ming Ma added a comment -

          I have committed this to trunk and branch-2. Thanks Sangjin Lee for the contribution and Gera Shegalov and Colin P. McCabe for the code review!

          Show
          mingma Ming Ma added a comment - I have committed this to trunk and branch-2. Thanks Sangjin Lee for the contribution and Gera Shegalov and Colin P. McCabe for the code review!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8089 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8089/)
          HADOOP-12107. long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8089 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8089/ ) HADOOP-12107 . long running apps may have a huge number of StatisticsData instances under FileSystem (Sangjin Lee via Ming Ma) (mingma: rev 8e1bdc17d9134e01115ae7c929503d8ac0325207) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FCStatisticsBaseTest.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          jira.shegalov Gera Shegalov added a comment -

          You could make the same argument to stop development on almost any patch.

          I disagree with such a strong statement. It's not the case in my experience. Thanks for pointing out the compatibility document. It gives us a formal basis to go on, and not delay Sangjin Lee's important fix. Maybe one day we'll have a compatibility test suite based on that doc.

          It's simply unreasonable to try to support users who are putting their code inside the org.apache.hadoop.fs

          We develop new Hadoop features and often they do not make it upstream immediately. It happens that we have classes in their intended packages but we can deal with this. We are not affected by this particular change, either.

          +1 for both trunk and branch 2. Ming Ma, do you want to exercise your committer rights ?

          Show
          jira.shegalov Gera Shegalov added a comment - You could make the same argument to stop development on almost any patch. I disagree with such a strong statement. It's not the case in my experience. Thanks for pointing out the compatibility document. It gives us a formal basis to go on, and not delay Sangjin Lee 's important fix. Maybe one day we'll have a compatibility test suite based on that doc. It's simply unreasonable to try to support users who are putting their code inside the org.apache.hadoop.fs We develop new Hadoop features and often they do not make it upstream immediately. It happens that we have classes in their intended packages but we can deal with this. We are not affected by this particular change, either. +1 for both trunk and branch 2. Ming Ma , do you want to exercise your committer rights ?
          Hide
          cmccabe Colin P. McCabe added a comment -

          Guys, we clearly define the API contract for the project. See http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html

          You have to remember that:
          1. The function that you are talking about changing (the constructor) is not public from Java's point of view. It is package-private.
          2. The function that you are talking about changing is not public from Hadoop's point of view (there is no @Public or @LimitedPrivate annotation on it)

          There is simply no reason to treat this as public.

          However, at the expense of being too defensive, the only test I apply here: is there hypothetically a scenario where an API user can be broken? My answer is yes if you have some org.apache.hadoop.fs.Foo calling the constructor even though the user absolutely should not do it.

          You could make the same argument to stop development on almost any patch. Almost every patch changes things which are private or package-private inside Hadoop. It's simply unreasonable to try to support users who are putting their code inside the org.apache.hadoop.fs namespace (or any other internal project namespace)

          Show
          cmccabe Colin P. McCabe added a comment - Guys, we clearly define the API contract for the project. See http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html You have to remember that: 1. The function that you are talking about changing (the constructor) is not public from Java's point of view. It is package-private. 2. The function that you are talking about changing is not public from Hadoop's point of view (there is no @Public or @LimitedPrivate annotation on it) There is simply no reason to treat this as public. However, at the expense of being too defensive, the only test I apply here: is there hypothetically a scenario where an API user can be broken? My answer is yes if you have some org.apache.hadoop.fs.Foo calling the constructor even though the user absolutely should not do it. You could make the same argument to stop development on almost any patch. Almost every patch changes things which are private or package-private inside Hadoop. It's simply unreasonable to try to support users who are putting their code inside the org.apache.hadoop.fs namespace (or any other internal project namespace)
          Hide
          jira.shegalov Gera Shegalov added a comment -

          I agree with the sentiment what API isolation contract should be.

          However, at the expense of being too defensive, the only test I apply here: is there hypothetically a scenario where an API user can be broken? My answer is yes if you have some org.apache.hadoop.fs.Foo calling the constructor even though the user absolutely should not do it. Regarding freezing the API, given that the question was only about branch-2, it does not sound negative to me.

          Show
          jira.shegalov Gera Shegalov added a comment - I agree with the sentiment what API isolation contract should be. However, at the expense of being too defensive, the only test I apply here: is there hypothetically a scenario where an API user can be broken? My answer is yes if you have some org.apache.hadoop.fs.Foo calling the constructor even though the user absolutely should not do it. Regarding freezing the API, given that the question was only about branch-2, it does not sound negative to me.
          Hide
          cmccabe Colin P. McCabe added a comment -

          OK, at the risk of being pedantic, here is my rundown. While the StatisticsData class itself is public, the StatisticsData constructor is not. It is "package-private" (the access class which things get in Java if there is no public, private, or protected keyword on them.) This means that a StatisticsData object can only be created by code in the org.apache.hadoop.fs package. You can try this for yourself-- write a program external to hadoop that tries to create a StatisticsData object via this constructor. It will not compile. This constructor is safe to remove, so let's do that.

          Colin Patrick McCabe, good point on the one hand but on the other hand this constructor is package-scope, and technically usable if an creates a class with the same package name, regardless how unlikely or illegal (in terms of specified audience) it is. How about we defensively keep that constructor for branch-2 at least?

          No. Users simply can't add code to the org.apache.hadoop.fs package. If they do, things are not going to work-- there are going to be naming conflicts, class resolution issues, etc. etc. There is no possible way we can support users doing this and no reason to support it. If we tried, we would have to essentially freeze the API of every single class in Hadoop-- we would have to re-have this discussion each time we changed some package-private variable or function. Private and package-private stuff is private-- it's even enforced by the compiler, you can't get much more private than that.

          Show
          cmccabe Colin P. McCabe added a comment - OK, at the risk of being pedantic, here is my rundown. While the StatisticsData class itself is public, the StatisticsData constructor is not. It is "package-private" (the access class which things get in Java if there is no public, private, or protected keyword on them.) This means that a StatisticsData object can only be created by code in the org.apache.hadoop.fs package. You can try this for yourself-- write a program external to hadoop that tries to create a StatisticsData object via this constructor. It will not compile. This constructor is safe to remove, so let's do that. Colin Patrick McCabe, good point on the one hand but on the other hand this constructor is package-scope, and technically usable if an creates a class with the same package name, regardless how unlikely or illegal (in terms of specified audience) it is. How about we defensively keep that constructor for branch-2 at least? No. Users simply can't add code to the org.apache.hadoop.fs package. If they do, things are not going to work-- there are going to be naming conflicts, class resolution issues, etc. etc. There is no possible way we can support users doing this and no reason to support it. If we tried, we would have to essentially freeze the API of every single class in Hadoop-- we would have to re-have this discussion each time we changed some package-private variable or function. Private and package-private stuff is private-- it's even enforced by the compiler, you can't get much more private than that.
          Hide
          sandyr Sandy Ryza added a comment -

          I don't know of anyone using the StatisticsData constructor. I originally made the class public for Spark's needs and it shouldn't ever need that constructor.

          This is neither an endorsement nor a -1 against removing the constructor.

          Show
          sandyr Sandy Ryza added a comment - I don't know of anyone using the StatisticsData constructor. I originally made the class public for Spark's needs and it shouldn't ever need that constructor. This is neither an endorsement nor a -1 against removing the constructor.
          Hide
          sjlee0 Sangjin Lee added a comment -

          FWIW, when I look at the spark code (which may have been the main reason for changes in HADOOP-10688), it doesn't look like it's using the constructor or the owner member variable: https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

          Show
          sjlee0 Sangjin Lee added a comment - FWIW, when I look at the spark code (which may have been the main reason for changes in HADOOP-10688 ), it doesn't look like it's using the constructor or the owner member variable: https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
          Hide
          sjlee0 Sangjin Lee added a comment -

          Colin P. McCabe, your thoughts on this?

          StatisticsData was made public in 2.5 (HADOOP-10688), and I doubt there would be an actual case of using the constructor. IMO, removing the constructor (and the member) seems pretty safe, in theory and practice. Sandy Ryza, what do you think?

          Show
          sjlee0 Sangjin Lee added a comment - Colin P. McCabe , your thoughts on this? StatisticsData was made public in 2.5 ( HADOOP-10688 ), and I doubt there would be an actual case of using the constructor. IMO, removing the constructor (and the member) seems pretty safe, in theory and practice. Sandy Ryza , what do you think?
          Hide
          jira.shegalov Gera Shegalov added a comment -

          bq StatisticsData is a public class, but its constructor is not public.

          Colin P. McCabe, good point on the one hand but on the other hand this constructor is package-scope, and technically usable if an creates a class with the same package name, regardless how unlikely or illegal (in terms of specified audience) it is. How about we defensively keep that constructor for branch-2 at least?

          Show
          jira.shegalov Gera Shegalov added a comment - bq StatisticsData is a public class, but its constructor is not public. Colin P. McCabe , good point on the one hand but on the other hand this constructor is package-scope, and technically usable if an creates a class with the same package name, regardless how unlikely or illegal (in terms of specified audience) it is. How about we defensively keep that constructor for branch-2 at least?
          Hide
          sjlee0 Sangjin Lee added a comment -

          I believe the test failure is unrelated. Also, the said test passes fine locally.

          Show
          sjlee0 Sangjin Lee added a comment - I believe the test failure is unrelated. Also, the said test passes fine locally.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 19s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 30s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 6s The applied patch generated 1 new checkstyle issues (total was 142, now 140).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 37s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 1m 50s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 common tests 21m 33s Tests failed in hadoop-common.
              60m 29s  



          Reason Tests
          Failed unit tests hadoop.security.ssl.TestReloadingX509TrustManager



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12741705/HADOOP-12107.005.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / afe9ea3
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/artifact/patchprocess/diffcheckstylehadoop-common.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/artifact/patchprocess/testrun_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 19s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 30s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 6s The applied patch generated 1 new checkstyle issues (total was 142, now 140). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 37s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 1m 50s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 common tests 21m 33s Tests failed in hadoop-common.     60m 29s   Reason Tests Failed unit tests hadoop.security.ssl.TestReloadingX509TrustManager Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12741705/HADOOP-12107.005.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / afe9ea3 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/artifact/patchprocess/diffcheckstylehadoop-common.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/artifact/patchprocess/testrun_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7033/console This message was automatically generated.
          Hide
          sjlee0 Sangjin Lee added a comment -

          patch v.5 posted

          • removed the weak reference
          • added @Override to StatisticsDataReferenceCleaner.run()
          • used GenericTestingUtils.waitFor() for the unit test

          Good catch about the weak reference not being public. Somehow I thought it was public.

          Actually StatisticsDataReference does not override any methods from PhantomReference (does not need to). I still marked StatisticsDataReferenceCleaner.run() with @Override.

          I also adapted the unit test to use GenericTestingUtils. Thanks!

          Show
          sjlee0 Sangjin Lee added a comment - patch v.5 posted removed the weak reference added @Override to StatisticsDataReferenceCleaner.run() used GenericTestingUtils.waitFor() for the unit test Good catch about the weak reference not being public. Somehow I thought it was public. Actually StatisticsDataReference does not override any methods from PhantomReference (does not need to). I still marked StatisticsDataReferenceCleaner.run() with @Override . I also adapted the unit test to use GenericTestingUtils . Thanks!
          Hide
          cmccabe Colin P. McCabe added a comment -

          003 patch is good. I just don't understand why 001 patch creates one additional thread per FileSystem object? I look at FileSystem#getStaticstics(..). I think it creates one Staticstics object per FileSystem class, so 001 patch creates one additional thread per FileSystem class? I'll be grateful if somebody can guide me through this.

          D'oh. You're absolutely right... there is only one thread per FileSystem class. Forget what I said earlier.

          One other point of discussion: I'm removing the StatisticsData constructor that takes a Thread as the argument (along with the owner member variable) as it no longer needs the thread inside StatisticsData. But since StatisticsData is technically a public class, it could be considered an API incompatible change. Thoughts on this? No one outside FileSystem is using the constructor or the member variable, and I cannot think of why anyone would. But if we need to absolutely maintain the API compatibility, I cannot remove them. Let me know what you think.

          StatisticsData is a public class, but its constructor is not public. So there is no possible API breakage here. (As a side note, even if StatisticsData were public, it doesn't have an interface annotation specifying that it is safe to use outside of Hadoop, so we're doubly safe here.)

          2921              /**
          2922	       * This constructor is deprecated and is no longer used. One should remove
          2923	       * any use of this constructor.
          2924	       */
          2925	      @Deprecated
          2926	      StatisticsData(WeakReference<Thread> owner) {
          

          Like I explained above, we don't need this. Let's get rid of it.

          StatisticsDataReference: let's put an @Override annotation on the functions which implement PhantomReference APIs.

          142	    final int maxSeconds = 10;
          143	    for (int i = 0; i < maxSeconds; i++) {
          144	      Thread.sleep(1000L);
          145	      allDataSize = stats.getAllThreadLocalDataSize();
          146	      if (allDataSize == 0) {
          147	        LOG.info("cleaned up after " + (i+1) + " seconds");
          148	        break;
          149	      } else {
          150	        LOG.info("not cleaned up after " + (i+1) + " seconds; waiting...");
          151	      }
          152	    }
          

          Let's use GenericTestUtils#waitFor here.

          +1 once those are addressed. Thanks, Sangjin Lee.

          Show
          cmccabe Colin P. McCabe added a comment - 003 patch is good. I just don't understand why 001 patch creates one additional thread per FileSystem object? I look at FileSystem#getStaticstics(..). I think it creates one Staticstics object per FileSystem class, so 001 patch creates one additional thread per FileSystem class? I'll be grateful if somebody can guide me through this. D'oh. You're absolutely right... there is only one thread per FileSystem class. Forget what I said earlier. One other point of discussion: I'm removing the StatisticsData constructor that takes a Thread as the argument (along with the owner member variable) as it no longer needs the thread inside StatisticsData. But since StatisticsData is technically a public class, it could be considered an API incompatible change. Thoughts on this? No one outside FileSystem is using the constructor or the member variable, and I cannot think of why anyone would. But if we need to absolutely maintain the API compatibility, I cannot remove them. Let me know what you think. StatisticsData is a public class, but its constructor is not public. So there is no possible API breakage here. (As a side note, even if StatisticsData were public, it doesn't have an interface annotation specifying that it is safe to use outside of Hadoop, so we're doubly safe here.) 2921 /** 2922 * This constructor is deprecated and is no longer used. One should remove 2923 * any use of this constructor. 2924 */ 2925 @Deprecated 2926 StatisticsData(WeakReference< Thread > owner) { Like I explained above, we don't need this. Let's get rid of it. StatisticsDataReference: let's put an @Override annotation on the functions which implement PhantomReference APIs. 142 final int maxSeconds = 10; 143 for ( int i = 0; i < maxSeconds; i++) { 144 Thread .sleep(1000L); 145 allDataSize = stats.getAllThreadLocalDataSize(); 146 if (allDataSize == 0) { 147 LOG.info( "cleaned up after " + (i+1) + " seconds" ); 148 break ; 149 } else { 150 LOG.info( "not cleaned up after " + (i+1) + " seconds; waiting..." ); 151 } 152 } Let's use GenericTestUtils#waitFor here. +1 once those are addressed. Thanks, Sangjin Lee .
          Hide
          jira.shegalov Gera Shegalov added a comment -

          +1, 004 LGTM

          Show
          jira.shegalov Gera Shegalov added a comment - +1, 004 LGTM
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 18s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 29s There were no new javac warning messages.
          +1 javadoc 9m 38s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 4s The applied patch generated 1 new checkstyle issues (total was 142, now 141).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse.
          +1 findbugs 1m 51s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 21m 41s Tests passed in hadoop-common.
              60m 38s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12741393/HADOOP-12107.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 122cad6
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/artifact/patchprocess/diffcheckstylehadoop-common.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/artifact/patchprocess/testrun_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 18s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 29s There were no new javac warning messages. +1 javadoc 9m 38s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 4s The applied patch generated 1 new checkstyle issues (total was 142, now 141). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse. +1 findbugs 1m 51s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 21m 41s Tests passed in hadoop-common.     60m 38s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12741393/HADOOP-12107.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 122cad6 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/artifact/patchprocess/diffcheckstylehadoop-common.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/artifact/patchprocess/testrun_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7027/console This message was automatically generated.
          Hide
          sjlee0 Sangjin Lee added a comment -

          patch v.4 posted

          Gera Shegalov, those are good suggestions.

          Changes:

          • made the cleaner thread uninterruptible
          • reduced the locking scope in getThreadStatistics()
          • renamed variables and a class to be more specific to stats data
          • added a timeout to the test method
          • made size and maxSeconds final
          Show
          sjlee0 Sangjin Lee added a comment - patch v.4 posted Gera Shegalov , those are good suggestions. Changes: made the cleaner thread uninterruptible reduced the locking scope in getThreadStatistics() renamed variables and a class to be more specific to stats data added a timeout to the test method made size and maxSeconds final
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Thanks for v3 Sangjin Lee!

          FileSystem.java:

          getThreadStatistics:
          Minimize the code executed under the monitor. Pull reference creation out of synchronized similar to what it was before. Note that currentThread is a native call.

          Cleaner#run
          Catch and log InterruptedException in the while loop, such that thread does not die on a spurious wakeup. It's safe since it's a daemon thread.

          Nits:
          can we be more specific in the naming, to the tune of: STATS_DATA_CLEANER, STATS_DATA_REFQUEUE, StatsDataCleaner.

          testStatisticsThreadLocalDataCleanUp
          Since the test uses waits, pass some reasonable timeout to @Test

          make 'int size' and 'int maxSeconds' final.

          Show
          jira.shegalov Gera Shegalov added a comment - Thanks for v3 Sangjin Lee ! FileSystem.java: getThreadStatistics : Minimize the code executed under the monitor. Pull reference creation out of synchronized similar to what it was before. Note that currentThread is a native call. Cleaner#run Catch and log InterruptedException in the while loop, such that thread does not die on a spurious wakeup. It's safe since it's a daemon thread. Nits: can we be more specific in the naming, to the tune of: STATS_DATA_CLEANER, STATS_DATA_REFQUEUE, StatsDataCleaner. testStatisticsThreadLocalDataCleanUp Since the test uses waits, pass some reasonable timeout to @Test make 'int size' and 'int maxSeconds' final.
          Hide
          mingma Ming Ma added a comment -

          Thanks Sangjin Lee. Latest patch LGTM. Colin P. McCabe, Gera Shegalov any additional comments?

          Show
          mingma Ming Ma added a comment - Thanks Sangjin Lee . Latest patch LGTM. Colin P. McCabe , Gera Shegalov any additional comments?
          Hide
          sjlee0 Sangjin Lee added a comment -

          Thanks for pointing that out Walter Su. I think you're right in that normally one Statistics instance is created per FileSystem class. That said, it is possible to create Statistics instances in other manners simply via invoking the public constructor.

          Show
          sjlee0 Sangjin Lee added a comment - Thanks for pointing that out Walter Su . I think you're right in that normally one Statistics instance is created per FileSystem class. That said, it is possible to create Statistics instances in other manners simply via invoking the public constructor.
          Hide
          walter.k.su Walter Su added a comment -

          This will create one additional thread per FileSystem object.

          003 patch is good. I just don't understand why 001 patch creates one additional thread per FileSystem object?
          I look at FileSystem#getStaticstics(..). I think it creates one Staticstics object per FileSystem class, so 001 patch creates one additional thread per FileSystem class? I'll be grateful if somebody can guide me through this.

          Show
          walter.k.su Walter Su added a comment - This will create one additional thread per FileSystem object. 003 patch is good. I just don't understand why 001 patch creates one additional thread per FileSystem object ? I look at FileSystem#getStaticstics(..) . I think it creates one Staticstics object per FileSystem class , so 001 patch creates one additional thread per FileSystem class ? I'll be grateful if somebody can guide me through this.
          Hide
          sjlee0 Sangjin Lee added a comment -

          The checkstyle warning:

          ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:1: File length is 3,415 lines (max allowed is 2,000).

          Nothing I can do about that.

          Your review is greatly appreciated. Thanks!

          Show
          sjlee0 Sangjin Lee added a comment - The checkstyle warning: ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:1: File length is 3,415 lines (max allowed is 2,000). Nothing I can do about that. Your review is greatly appreciated. Thanks!
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 19s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 28s There were no new javac warning messages.
          +1 javadoc 9m 35s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 5s The applied patch generated 1 new checkstyle issues (total was 142, now 141).
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 50s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 21m 48s Tests passed in hadoop-common.
              60m 40s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12741209/HADOOP-12107.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 99271b7
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/artifact/patchprocess/diffcheckstylehadoop-common.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/artifact/patchprocess/testrun_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 19s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 28s There were no new javac warning messages. +1 javadoc 9m 35s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 5s The applied patch generated 1 new checkstyle issues (total was 142, now 141). +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 50s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 21m 48s Tests passed in hadoop-common.     60m 40s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12741209/HADOOP-12107.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 99271b7 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/artifact/patchprocess/diffcheckstylehadoop-common.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/artifact/patchprocess/testrun_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7020/console This message was automatically generated.
          Hide
          sjlee0 Sangjin Lee added a comment -

          v.3 patch posted.

          • addressed the checkstyle issue
          Show
          sjlee0 Sangjin Lee added a comment - v.3 patch posted. addressed the checkstyle issue
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 18s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 28s There were no new javac warning messages.
          +1 javadoc 9m 32s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 5s The applied patch generated 3 new checkstyle issues (total was 142, now 143).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 36s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 1m 50s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 21m 56s Tests passed in hadoop-common.
              60m 43s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12741198/HADOOP-12107.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 99271b7
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/artifact/patchprocess/diffcheckstylehadoop-common.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/artifact/patchprocess/testrun_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 18s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 28s There were no new javac warning messages. +1 javadoc 9m 32s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 5s The applied patch generated 3 new checkstyle issues (total was 142, now 143). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 36s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 50s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 21m 56s Tests passed in hadoop-common.     60m 43s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12741198/HADOOP-12107.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 99271b7 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/artifact/patchprocess/diffcheckstylehadoop-common.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/artifact/patchprocess/testrun_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7019/console This message was automatically generated.
          Hide
          sjlee0 Sangjin Lee added a comment -

          v.2 patch posted.

          Changes:

          • restored the weak reference in StatisticsData and deprecated it instead
          • changed the cleaner thread (and the reference queue) to be static (global)
          • fixed a bug in cleanUp() where a wrong type was being removed
          • changed the reference list from a LinkedList to a HashSet for faster removal
          • wrote a new unit test for testing this
          • removed unnecessary null check in visitAll()

          This should address most of the review comments.

          As for the cleaner thread, I was able to convince myself that a single global cleaner thread should be adequate. I do not see thread safety issues, and it just needs to keep up with the rate of (threads being discarded) * (filesystem instances). Even if it could fall behind at times, it should be able to catch up barring the most pathological situation. Let me know if you guys are OK with that reasoning. Thanks!

          Show
          sjlee0 Sangjin Lee added a comment - v.2 patch posted. Changes: restored the weak reference in StatisticsData and deprecated it instead changed the cleaner thread (and the reference queue) to be static (global) fixed a bug in cleanUp() where a wrong type was being removed changed the reference list from a LinkedList to a HashSet for faster removal wrote a new unit test for testing this removed unnecessary null check in visitAll() This should address most of the review comments. As for the cleaner thread, I was able to convince myself that a single global cleaner thread should be adequate. I do not see thread safety issues, and it just needs to keep up with the rate of (threads being discarded) * (filesystem instances). Even if it could fall behind at times, it should be able to catch up barring the most pathological situation. Let me know if you guys are OK with that reasoning. Thanks!
          Hide
          jira.shegalov Gera Shegalov added a comment -

          Between a single thread and a thread per FileSystem object, we can compromise on having a configurable single thread pool for the class FileSystem.

          Let us maintain fool-proof compatibility to make sure that we can safely backport the patch, and just deprecate the old constructor.

          I don't see a test covering the scenario in this JIRA. It would be good to add a test to fs.FCStatisticsBaseTest

          Show
          jira.shegalov Gera Shegalov added a comment - Between a single thread and a thread per FileSystem object, we can compromise on having a configurable single thread pool for the class FileSystem. Let us maintain fool-proof compatibility to make sure that we can safely backport the patch, and just deprecate the old constructor. I don't see a test covering the scenario in this JIRA. It would be good to add a test to fs.FCStatisticsBaseTest
          Hide
          sjlee0 Sangjin Lee added a comment -

          Thanks Ming Ma and Colin P. McCabe for your comments!

          Is there a good way to write any unit test for this?

          I was relying on existing unit tests on FileSystem.Statistics (there are several). I'm still hoping that is good enough for this, but let me know if you want me to look into creating more unit tests around this.

          Regarding the number of threads for the cleaner, yes, it definitely occurred to me that this might create an issue in terms of potentially a large number of additional threads. The only reason I shied away from a global cleaner thread initially is I wasn't entirely sure whether it would be completely safe and perform well in terms of locking (note that the cleaner needs to acquire a lock for each Statistics instance). Let me explore that still.

          The null check isn't needed any more since you changed allData to be initialized in the constructor.

          Thanks for pointing that out. I remembered it at one point, but forgot to include that change in the patch.

          One other point of discussion: I'm removing the StatisticsData constructor that takes a Thread as the argument (along with the owner member variable) as it no longer needs the thread inside StatisticsData. But since StatisticsData is technically a public class, it could be considered an API incompatible change. Thoughts on this? No one outside FileSystem is using the constructor or the member variable, and I cannot think of why anyone would. But if we need to absolutely maintain the API compatibility, I cannot remove them. Let me know what you think.

          Show
          sjlee0 Sangjin Lee added a comment - Thanks Ming Ma and Colin P. McCabe for your comments! Is there a good way to write any unit test for this? I was relying on existing unit tests on FileSystem.Statistics (there are several). I'm still hoping that is good enough for this, but let me know if you want me to look into creating more unit tests around this. Regarding the number of threads for the cleaner, yes, it definitely occurred to me that this might create an issue in terms of potentially a large number of additional threads. The only reason I shied away from a global cleaner thread initially is I wasn't entirely sure whether it would be completely safe and perform well in terms of locking (note that the cleaner needs to acquire a lock for each Statistics instance). Let me explore that still. The null check isn't needed any more since you changed allData to be initialized in the constructor. Thanks for pointing that out. I remembered it at one point, but forgot to include that change in the patch. One other point of discussion: I'm removing the StatisticsData constructor that takes a Thread as the argument (along with the owner member variable) as it no longer needs the thread inside StatisticsData . But since StatisticsData is technically a public class, it could be considered an API incompatible change. Thoughts on this? No one outside FileSystem is using the constructor or the member variable, and I cannot think of why anyone would. But if we need to absolutely maintain the API compatibility, I cannot remove them. Let me know what you think.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for looking at this, Sangjin Lee. It looks like a good idea to fix this.

          As Ming Ma commented, I think we need to have a single thread per JVM to do this, rather than one thread per FileSystem object. We have a number of Hadoop applications which create lots of FileSystem objects, and creating a thread per FS object would just be too many threads. We could end up with a thread leak which was worse than the memory leak described here. (I agree that well-written applications should strive to avoid creating too many FileSystem objects, but that's a separate issue...)

              private synchronized <T> T visitAll(StatisticsAggregator<T> visitor) {
                visitor.accept(rootData);
                if (allData != null) {
          

          The null check isn't needed any more since you changed allData to be initialized in the constructor.

          thanks

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for looking at this, Sangjin Lee . It looks like a good idea to fix this. As Ming Ma commented, I think we need to have a single thread per JVM to do this, rather than one thread per FileSystem object. We have a number of Hadoop applications which create lots of FileSystem objects, and creating a thread per FS object would just be too many threads. We could end up with a thread leak which was worse than the memory leak described here. (I agree that well-written applications should strive to avoid creating too many FileSystem objects, but that's a separate issue...) private synchronized <T> T visitAll(StatisticsAggregator<T> visitor) { visitor.accept(rootData); if (allData != null ) { The null check isn't needed any more since you changed allData to be initialized in the constructor. thanks
          Hide
          mingma Ming Ma added a comment -

          Thanks, Sangjin Lee, great find. It overall looks good to me.

          • Is there a good way to write any unit test for this?
          • This will create one additional thread per FileSystem object. It is fine for most scenarios where one FileSystem object is shared among different threads in the same JVM. If the application creates many FileSystem objects in same JVM, this will create lots of additional threads. But it doesn't seem like something worthwhile optimizing for. Just bring it up in case.
          Show
          mingma Ming Ma added a comment - Thanks, Sangjin Lee , great find. It overall looks good to me. Is there a good way to write any unit test for this? This will create one additional thread per FileSystem object. It is fine for most scenarios where one FileSystem object is shared among different threads in the same JVM. If the application creates many FileSystem objects in same JVM, this will create lots of additional threads. But it doesn't seem like something worthwhile optimizing for. Just bring it up in case.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 26s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 29s There were no new javac warning messages.
          +1 javadoc 9m 39s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 1m 4s The applied patch generated 1 new checkstyle issues (total was 142, now 140).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          -1 findbugs 1m 56s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings.
          +1 common tests 21m 55s Tests passed in hadoop-common.
              61m 3s  



          Reason Tests
          FindBugs module:hadoop-common



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12740755/HADOOP-12107.001.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 20c03c9
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/artifact/patchprocess/diffcheckstylehadoop-common.txt
          Findbugs warnings https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
          hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/artifact/patchprocess/testrun_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 26s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 29s There were no new javac warning messages. +1 javadoc 9m 39s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 4s The applied patch generated 1 new checkstyle issues (total was 142, now 140). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. -1 findbugs 1m 56s The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. +1 common tests 21m 55s Tests passed in hadoop-common.     61m 3s   Reason Tests FindBugs module:hadoop-common Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12740755/HADOOP-12107.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 20c03c9 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/artifact/patchprocess/diffcheckstylehadoop-common.txt Findbugs warnings https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/artifact/patchprocess/testrun_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/6997/console This message was automatically generated.
          Hide
          sjlee0 Sangjin Lee added a comment -

          Proposed patch to solicit feedback on the fix.

          Show
          sjlee0 Sangjin Lee added a comment - Proposed patch to solicit feedback on the fix.
          Hide
          sjlee0 Sangjin Lee added a comment -

          The key method in clearing this memory is in Statistics.visitAll():

              private synchronized <T> T visitAll(StatisticsAggregator<T> visitor) {
                visitor.accept(rootData);
                if (allData != null) {
                  for (Iterator<StatisticsData> iter = allData.iterator();
                      iter.hasNext(); ) {
                    StatisticsData data = iter.next();
                    visitor.accept(data);
                    if (data.owner.get() == null) {
                      /*
                       * If the thread that created this thread-local data no
                       * longer exists, remove the StatisticsData from our list
                       * and fold the values into rootData.
                       */
                      rootData.add(data);
                      iter.remove();
                    }
                  }
                }
                return visitor.aggregate();
              }
          

          As part of running the visitor, it checks to see if the underlying thread is gone, and if so, adds the data for that thread to rootData and removes the instance from the list.

          This pattern almost literally cries out for using a PhantomReference. That way, we can perform this operation as soon as the garbage collector clears up the threads. I'll draw up a patch based on that idea soon.

          Show
          sjlee0 Sangjin Lee added a comment - The key method in clearing this memory is in Statistics.visitAll() : private synchronized <T> T visitAll(StatisticsAggregator<T> visitor) { visitor.accept(rootData); if (allData != null ) { for (Iterator<StatisticsData> iter = allData.iterator(); iter.hasNext(); ) { StatisticsData data = iter.next(); visitor.accept(data); if (data.owner.get() == null ) { /* * If the thread that created this thread-local data no * longer exists, remove the StatisticsData from our list * and fold the values into rootData. */ rootData.add(data); iter.remove(); } } } return visitor.aggregate(); } As part of running the visitor, it checks to see if the underlying thread is gone, and if so, adds the data for that thread to rootData and removes the instance from the list. This pattern almost literally cries out for using a PhantomReference . That way, we can perform this operation as soon as the garbage collector clears up the threads. I'll draw up a patch based on that idea soon.

            People

            • Assignee:
              sjlee0 Sangjin Lee
              Reporter:
              sjlee0 Sangjin Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development