Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.0
    • Labels:
      None

      Description

      Created as MRUNIT-97 became a patch for InputSplits.

      1. testfile
        0.0 kB
        Dave Beech
      2. testarchive.tar.abc
        7 kB
        Dave Beech
      3. testarchive.tar
        7 kB
        Dave Beech
      4. MRUNIT-98.patch
        29 kB
        Dave Beech

        Issue Links

          Activity

          Hide
          Jim Donofrio added a comment -

          Dave,

          I saw you commented on this MRUNIT-97, any thoughts on this?

          I was thinking all new really need to do is copy some jobconf parameters and files around. We could just generate random dir's in the temp dir such as:

          File outputPath = new File(System.getProperty("java.io.tmpdir"), "mrunit-" + Math.random());

          and then copy the files there and unzip them if necessary and then set the destination conf parameter

          Thoughts?

          Show
          Jim Donofrio added a comment - Dave, I saw you commented on this MRUNIT-97 , any thoughts on this? I was thinking all new really need to do is copy some jobconf parameters and files around. We could just generate random dir's in the temp dir such as: File outputPath = new File(System.getProperty("java.io.tmpdir"), "mrunit-" + Math.random()); and then copy the files there and unzip them if necessary and then set the destination conf parameter Thoughts?
          Hide
          Brock Noland added a comment -

          Some people use fact that in hadoop you have symlink to the file in the tasks CWD (I think Tom White even says this one edition of his book) and do the following in the configure method:

          File file = new File("relative-path-to-file");

          I am not sure how we would support that without creating symlinks in the jvms CWD which could be ugly if not cleaned up. Maybe we don't need to support that use.

          Show
          Brock Noland added a comment - Some people use fact that in hadoop you have symlink to the file in the tasks CWD (I think Tom White even says this one edition of his book) and do the following in the configure method: File file = new File("relative-path-to-file"); I am not sure how we would support that without creating symlinks in the jvms CWD which could be ugly if not cleaned up. Maybe we don't need to support that use.
          Hide
          Jim Donofrio added a comment -

          Yeah symlinks would be a mess because we would create a bunch of shortcuts in the current directory which in eclipse would be the root of their project

          Show
          Jim Donofrio added a comment - Yeah symlinks would be a mess because we would create a bunch of shortcuts in the current directory which in eclipse would be the root of their project
          Hide
          Dave Beech added a comment -

          My first plan was to try and simulate the cache without reading/writing any actual files, by mocking the calls to DistributedCache and backing it with data preloaded into memory by the test. Disk I/O in unit tests seems wrong, I always try and avoid it wherever possible! But, I'm probably just over-complicating things - Jim, your suggestion might be the way forward for a first stab at this. I'll try and put something together later and get it over to you guys to check out.

          Agree the symlinks thing would be a mess, and would also cause a problem for Windows users. It's annoying as that's the method I prefer using to get files from the cache myself.

          Show
          Dave Beech added a comment - My first plan was to try and simulate the cache without reading/writing any actual files, by mocking the calls to DistributedCache and backing it with data preloaded into memory by the test. Disk I/O in unit tests seems wrong, I always try and avoid it wherever possible! But, I'm probably just over-complicating things - Jim, your suggestion might be the way forward for a first stab at this. I'll try and put something together later and get it over to you guys to check out. Agree the symlinks thing would be a mess, and would also cause a problem for Windows users. It's annoying as that's the method I prefer using to get files from the cache myself.
          Hide
          Jim Donofrio added a comment -

          Yes I am finishing up MRUNIT-101 and originally I had planned to extend FileSystem with a MemoryFileSystem class. Brock and Joey Echeverria convinced me otherwise in MRUNIT-102. It was much easier to just create random directories in temp. I am not too familar with these mock frameworks but would there be a way to make File even work with any filesystem besides the local one?

          Show
          Jim Donofrio added a comment - Yes I am finishing up MRUNIT-101 and originally I had planned to extend FileSystem with a MemoryFileSystem class. Brock and Joey Echeverria convinced me otherwise in MRUNIT-102 . It was much easier to just create random directories in temp. I am not too familar with these mock frameworks but would there be a way to make File even work with any filesystem besides the local one?
          Hide
          Dave Beech added a comment -

          Well, it wouldn't be the File object you'd need to mock as such, it'd be whatever reader or stream you open to consume that file.

          Having thought about it some more, it's probably not worth the effort. I'm sure it's possible, but the user's mapper or configure method code would need to be written in a way which allows the mock objects to be injected in the right places, and would require a lot of non-mrunit test fixture setup code that I doubt anyone would want to write.

          Using temp files would definitely be a lot easier. Let's go with that.

          Show
          Dave Beech added a comment - Well, it wouldn't be the File object you'd need to mock as such, it'd be whatever reader or stream you open to consume that file. Having thought about it some more, it's probably not worth the effort. I'm sure it's possible, but the user's mapper or configure method code would need to be written in a way which allows the mock objects to be injected in the right places, and would require a lot of non-mrunit test fixture setup code that I doubt anyone would want to write. Using temp files would definitely be a lot easier. Let's go with that.
          Hide
          Jim Donofrio added a comment -

          Yes I agree that our goal should always be to run the same code that will run on the cluster, otherwise we are not really testing anything, we are testing test code.

          Show
          Jim Donofrio added a comment - Yes I agree that our goal should always be to run the same code that will run on the cluster, otherwise we are not really testing anything, we are testing test code.
          Hide
          Jim Donofrio added a comment -

          Dave,

          Have you had time to work on this at all?

          Show
          Jim Donofrio added a comment - Dave, Have you had time to work on this at all?
          Hide
          Dave Beech added a comment -

          I have, I think I'm almost done actually. Sorry for the slow progress, I haven't had much free time of late. I'll update this jira either this evening or tomorrow and get a review from you guys.

          Show
          Dave Beech added a comment - I have, I think I'm almost done actually. Sorry for the slow progress, I haven't had much free time of late. I'll update this jira either this evening or tomorrow and get a review from you guys.
          Hide
          Dave Beech added a comment -

          This will probably need its own JIRA but I'll post here as I was working on this when I noticed it...

          I'm a little concerned about the amount of duplicated code between the various driver classes, specifically the run methods in MapDriver and ReduceDriver, the runTest methods in MapDriverBase and ReduceDriverBase and the various set/add/with methods.

          It feels like a lot more stuff should be pulled up into a superclass. Any thoughts?

          Show
          Dave Beech added a comment - This will probably need its own JIRA but I'll post here as I was working on this when I noticed it... I'm a little concerned about the amount of duplicated code between the various driver classes, specifically the run methods in MapDriver and ReduceDriver, the runTest methods in MapDriverBase and ReduceDriverBase and the various set/add/with methods. It feels like a lot more stuff should be pulled up into a superclass. Any thoughts?
          Hide
          Jim Donofrio added a comment -

          I agree there is too much duplicated code, some of it is difficult to abstract due to similar map driver code but different api's, mapred vs mapreduce. Other parts of the code are difficult to abstract because of the fluent with methods that have to have a return type of the actual class, there is probably some good way to do that with generics

          Show
          Jim Donofrio added a comment - I agree there is too much duplicated code, some of it is difficult to abstract due to similar map driver code but different api's, mapred vs mapreduce. Other parts of the code are difficult to abstract because of the fluent with methods that have to have a return type of the actual class, there is probably some good way to do that with generics
          Hide
          Dave Beech added a comment -

          Here's an almost complete patch for this feature in case anybody wants to take a look. Javadoc is patchy and I haven't done unit tests for the new mapreduce api yet, but it's nearly there.

          Show
          Dave Beech added a comment - Here's an almost complete patch for this feature in case anybody wants to take a look. Javadoc is patchy and I haven't done unit tests for the new mapreduce api yet, but it's nearly there.
          Hide
          Dave Beech added a comment -

          This work will also resolve MRUNIT-124

          Show
          Dave Beech added a comment - This work will also resolve MRUNIT-124
          Hide
          Brock Noland added a comment -

          Nice!

          -DistCacheUtils.findResource I'd like to wrap the URISyntaxException in an exception with a good error message.
          -TestDriver.cleanupDistributedCache does not delete the directory recursively. Note we cannot use guava for that because it is removed in future versions. There is a class FileUtils in common which does this.

          Show
          Brock Noland added a comment - Nice! -DistCacheUtils.findResource I'd like to wrap the URISyntaxException in an exception with a good error message. -TestDriver.cleanupDistributedCache does not delete the directory recursively. Note we cannot use guava for that because it is removed in future versions. There is a class FileUtils in common which does this.
          Hide
          Dave Beech added a comment -

          Thanks Brock - I will sort those things out. I'll implement the recursive delete once we've settled on a method to do it on the dev list!

          Show
          Dave Beech added a comment - Thanks Brock - I will sort those things out. I'll implement the recursive delete once we've settled on a method to do it on the dev list!
          Hide
          Dave Beech added a comment -

          I've committed in 1371380. Now resolving this and MRUNIT-124

          Show
          Dave Beech added a comment - I've committed in 1371380. Now resolving this and MRUNIT-124
          Hide
          Brock Noland added a comment -

          I am getting test failures. testAddCacheArchiveToMapperUsingDriverMethod:
          12/08/09 14:18:25 ERROR mrunit.TestDriver: Matched expected output (testarchive.tar/a, file) but at incorrect position 4 (expected position 1)
          12/08/09 14:18:25 ERROR mrunit.TestDriver: Matched expected output (testarchive.tar/d, file) but at incorrect position 1 (expected position 4)

          Show
          Brock Noland added a comment - I am getting test failures. testAddCacheArchiveToMapperUsingDriverMethod: 12/08/09 14:18:25 ERROR mrunit.TestDriver: Matched expected output (testarchive.tar/a, file) but at incorrect position 4 (expected position 1) 12/08/09 14:18:25 ERROR mrunit.TestDriver: Matched expected output (testarchive.tar/d, file) but at incorrect position 1 (expected position 4)
          Hide
          Dave Beech added a comment -

          I know, I know! Sorry All green for me but failed on Jenkins too. I've committed a fix which turns off order matching, since it's not important anyway

          Show
          Dave Beech added a comment - I know, I know! Sorry All green for me but failed on Jenkins too. I've committed a fix which turns off order matching, since it's not important anyway

            People

            • Assignee:
              Dave Beech
              Reporter:
              Ajay Srivastava
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development