Bigtop
  1. Bigtop
  2. BIGTOP-1066

multifilewc and possibly other mapreduce smoke tests rely on mapred: should they be separated or replaced?

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: backlog
    • Component/s: tests
    • Labels:
      None

      Description

      I've updated this ticket to be more focused on a solution rather than brainstorming the problem ...

      Problem: The use of old tests (i.e. those that use mapred) can cause issues. Bigtop gets its hadoop mapreduce tests from the underlying mapreduce library, which in some ways is good - it is expected to match the distro - but other ways , its bad, because if takes control away from the person running the bigtop smokes.

      Solution: Should bigtop have a specific version of mapreduce smokes which it runs by linking to the hadoop-examples as maven deps instead of by referencing the hadoop-examples jars in the distribution?

      Any thoughts? A patch that implemented this would simply add a pom file dependency snippet referencing hadoop-examples, or else, pull the jars down some other way, and run them directly rather then referencing HADOOP_HOME/examples*jar

        Activity

        Konstantin Boudnik made changes -
        Fix Version/s backlog [ 12324373 ]
        Konstantin Boudnik made changes -
        Affects Version/s 0.7.0 [ 12324362 ]
        Konstantin Boudnik made changes -
        Component/s Tests [ 12315617 ]
        jay vyas made changes -
        Summary multifilewc and possibly other tests rely on mapred: should they be separated or replaced? multifilewc and possibly other mapreduce smoke tests rely on mapred: should they be separated or replaced?
        jay vyas made changes -
        Description The use of old tests (i.e. those that use mapred) can cause issues. Is is it considered standard pratcice to still support the mapred.* classes?

        I say this because in my bigtop smokes, we fail the multifilewc test, because its RecordReader seems to try to call getPos() after the underlying stream is closed in the FileSystem implementation. If we update to using the new mapreduce.* implementation of multifilewc, which uses CombineFileSplit instead of the older MultiFileInputFormat, the job works fine.

        So - this is partly a question : should bigtop tests which rely on mapred.*

        1) classes be separated out into a different test suite, or

        2) deleted entirely for 2.0 (forward), or

        3) simply ignored on a case-by-case/cluster-by-cluster basis?

        I favor (1) : It would be nice if we could decide, at runtime, declaratively - wether or not we wanted to use tests that exersize the old (2009) mapred classes.


        I've updated this ticket to be more focused on a solution rather than brainstorming the problem :) ...

        Problem: The use of old tests (i.e. those that use mapred) can cause issues. Bigtop gets its hadoop mapreduce tests from the underlying mapreduce library, which in some ways is good - it is expected to match the distro - but other ways , its bad, because if takes control away from the person running the bigtop smokes.


        Solution: Should bigtop have a specific version of mapreduce smokes which it runs by linking to the hadoop-examples as maven deps instead of by referencing the hadoop-examples jars in the distribution?

        Any thoughts? A patch that implemented this would simply add a pom file dependency snippet referencing hadoop-examples, or else, pull the jars down some other way, and run them directly rather then referencing HADOOP_HOME/examples*jar

        jay vyas made changes -
        Field Original Value New Value
        Description The use of old tests (i.e. those that use mapred) can cause issues. Is is it considered standard pratcice to still support the mapred.* classes?

        I say this because in my bigtop smokes, we fail the multifilewc test, because its RecordReader seems to try to call getPos() after the underlying stream is closed in the FileSystem implementation. If we update to using the new mapreduce.* implementation of multifilewc, which uses CombineFileSplit instead of the older MultiFileInputFormat, the job works fine.

        So - this is partly a question : should bigtop tests which rely on mapred.* 1) classes be separated out into a different test suite, or
        2) deleted entirely for 2.0 (forward), or
        3) simply ignored on a case-by-case/cluster-by-cluster basis?

        I favor (1) : It would be nice if we could decide, at runtime, declaratively - wether or not we wanted to use tests that exersize the old (2009) mapred classes.


        The use of old tests (i.e. those that use mapred) can cause issues. Is is it considered standard pratcice to still support the mapred.* classes?

        I say this because in my bigtop smokes, we fail the multifilewc test, because its RecordReader seems to try to call getPos() after the underlying stream is closed in the FileSystem implementation. If we update to using the new mapreduce.* implementation of multifilewc, which uses CombineFileSplit instead of the older MultiFileInputFormat, the job works fine.

        So - this is partly a question : should bigtop tests which rely on mapred.*

        1) classes be separated out into a different test suite, or

        2) deleted entirely for 2.0 (forward), or

        3) simply ignored on a case-by-case/cluster-by-cluster basis?

        I favor (1) : It would be nice if we could decide, at runtime, declaratively - wether or not we wanted to use tests that exersize the old (2009) mapred classes.


        jay vyas created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            jay vyas
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development