Details

    • Type: Test Test
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: backlog
    • Component/s: Tests
    • Labels:
      None

      Description

      There are issues in the past where the JobTracker has failures because of so called "memory-leaks" , or infinitely growing objects, unclean tmp/ folders from jobs, etc.

      Thus, the proposal here is a lighteweight job which can be run rapidly, several 1000 times, which confirms that jobtracker state does not grow out of bounds or infinitely with respect to number of tasks/jobs run/submitted.

      IMPLEMENTATION PROPOSAL:

      Some simple starts would be to :

      • run word count, or a the sleep job 100 or 1000 times or 10,000 times.
      • create and delete the same file over and over again several thousand times to see if filesystem consistency is maintained

      To start, I'd like to add all these tests in a single module , under test-executions/stress/. Then later we could shard it out in another way.

      UPDATE:

      As per comments below, just noting that although phrased in terms of "JobTracker", the spirit of this ticket is to be applicable in both mr1 and mr2, since in either case, the purpose is to test the impact that several 100/1000 mapreduce job runs has over time and confirm that tmp dirs, objects in memory, etc are all managed and lifecycled properly .

        Issue Links

          Activity

          Hide
          jay vyas added a comment -

          Okay we will press forward with this "mapreduce centric" stress test.

          Show
          jay vyas added a comment - Okay we will press forward with this "mapreduce centric" stress test.
          Hide
          Konstantin Boudnik added a comment -

          Yes, I agree. Perhaps this JIRA can stay open and add a different angle to the cluster load. BIGTOP-1198 actually stresses out the underlying hardware basically limiting the amount of RAM, CPU, disk bandwidth available to the Hadoop software. Hence, it isn't exactly Hadoop stress test.

          Show
          Konstantin Boudnik added a comment - Yes, I agree. Perhaps this JIRA can stay open and add a different angle to the cluster load. BIGTOP-1198 actually stresses out the underlying hardware basically limiting the amount of RAM, CPU, disk bandwidth available to the Hadoop software. Hence, it isn't exactly Hadoop stress test.
          Hide
          Mikhail Antonov added a comment -

          Technically, stress tests may utilize different approaches, right - that can impose purely OS load, like I'm doing it now, or they can run tons of MR job. May be supplimental approach.

          Show
          Mikhail Antonov added a comment - Technically, stress tests may utilize different approaches, right - that can impose purely OS load, like I'm doing it now, or they can run tons of MR job. May be supplimental approach.
          Hide
          jay vyas added a comment -

          The activity on BIGTOP-1198 seems to have supplanted this JIRA, and is redundant. lets close this one. Yay for stress tests !

          Show
          jay vyas added a comment - The activity on BIGTOP-1198 seems to have supplanted this JIRA, and is redundant. lets close this one. Yay for stress tests !
          Hide
          jay vyas added a comment -

          Here is another example of how these tests would be beneficial:

          See (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed

          Show
          jay vyas added a comment - Here is another example of how these tests would be beneficial: See ( HDFS-5546 ) race condition crashes "hadoop ls -R" when directories are moved/removed
          Hide
          jay vyas added a comment -

          Actually, looking deeper I see the purpose of the MR2 comment: The JobTracker is actually ephemeral, so any state issues which might arise really should be gone. In any case, I guess the JobHistoryServer may have some such state issues, even still. But its an important thing to note: Stress tests in MR2 will have to test different components, probably the ones in MR2 which have more likeihood of persisting for a long time.

          Show
          jay vyas added a comment - Actually, looking deeper I see the purpose of the MR2 comment: The JobTracker is actually ephemeral, so any state issues which might arise really should be gone. In any case, I guess the JobHistoryServer may have some such state issues, even still. But its an important thing to note: Stress tests in MR2 will have to test different components, probably the ones in MR2 which have more likeihood of persisting for a long time.
          Hide
          jay vyas added a comment -

          Sure but motivation behind stress testing and logic remains the same .

          Show
          jay vyas added a comment - Sure but motivation behind stress testing and logic remains the same .
          Hide
          Konstantin Boudnik added a comment -

          As you know, JT doesn't exist in Hadoop2

          Show
          Konstantin Boudnik added a comment - As you know, JT doesn't exist in Hadoop2

            People

            • Assignee:
              Unassigned
              Reporter:
              jay vyas
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development