Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2095

Gridmix unable to run for compressed traces(.gz format).

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.22.0
    • Component/s: contrib/gridmix
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I was trying to run gridmix with compressed trace file.However, it throws a JsonParseException and exit.

      exception details:
      ==================
      org.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n,
      \t) is allowed between tokens
      at [Source: org.apache.hadoop.fs.FSDataInputStream@17ba38f; line: 1, column: 2]
      at org.codehaus.jackson.impl.JsonParserBase._constructError(JsonParserBase.java:651)
      at org.codehaus.jackson.impl.JsonParserBase._reportError(JsonParserBase.java:635)
      at org.codehaus.jackson.impl.JsonParserBase._throwInvalidSpace(JsonParserBase.java:596)
      at org.codehaus.jackson.impl.Utf8StreamParser._skipWSOrEnd(Utf8StreamParser.java:981)
      at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:77)
      at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:688)
      at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:624)
      at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:275)
      at org.apache.hadoop.tools.rumen.JsonObjectMapperParser.getNext(JsonObjectMapperParser.java:84)
      at org.apache.hadoop.tools.rumen.ZombieJobProducer.getNextJob(ZombieJobProducer.java:117)
      at org.apache.hadoop.tools.rumen.ZombieJobProducer.getNextJob(ZombieJobProducer.java:29)
      at org.apache.hadoop.mapred.gridmix.JobFactory.getNextJobFiltered(JobFactory.java:174)
      at org.apache.hadoop.mapred.gridmix.StressJobFactory$StressReaderThread.run(StressJobFactory.java:166)
      10/09/23 09:43:17 ERROR gridmix.Gridmix: Error in trace
      org.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n,
      \t) is allowed between tokens
      at [Source: org.apache.hadoop.fs.FSDataInputStream@17ba38f; line: 1, column: 2]
      at org.codehaus.jackson.impl.JsonParserBase._constructError(JsonParserBase.java:651)
      at org.codehaus.jackson.impl.JsonParserBase._reportError(JsonParserBase.java:635)
      at org.codehaus.jackson.impl.JsonParserBase._throwInvalidSpace(JsonParserBase.java:596)
      at org.codehaus.jackson.impl.Utf8StreamParser._skipWSOrEnd(Utf8StreamParser.java:981)
      at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:77)
      at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:688)
      at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:624)
      at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:275)
      at org.apache.hadoop.tools.rumen.JsonObjectMapperParser.getNext(JsonObjectMapperParser.java:84)
      at org.apache.hadoop.tools.rumen.ZombieJobProducer.getNextJob(ZombieJobProducer.java:117)
      at org.apache.hadoop.tools.rumen.ZombieJobProducer.getNextJob(ZombieJobProducer.java:29)
      at org.apache.hadoop.mapred.gridmix.JobFactory.getNextJobFiltered(JobFactory.java:174)
      at org.apache.hadoop.mapred.gridmix.StressJobFactory$StressReaderThread.run(StressJobFactory.java:166)
      10/09/23 09:43:17 INFO gridmix.Gridmix: Exiting...

      1. MAPREDUCE-2095_v3.1.patch
        9 kB
        Ravi Gummadi
      2. MAPREDUCE-2095_v3.patch
        7 kB
        Ranjit Mathew
      3. MAPREDUCE-2095_v2.patch
        6 kB
        Ranjit Mathew
      4. wordcount.json.gz
        1 kB
        Ranjit Mathew
      5. MAPREDUCE-2095.patch
        2 kB
        Ranjit Mathew

        Issue Links

          Activity

          Hide
          Ranjit Mathew added a comment -

          Here's a patch that uses the existing capability of Rumen to read compressed traces.

          Show
          Ranjit Mathew added a comment - Here's a patch that uses the existing capability of Rumen to read compressed traces.
          Hide
          Ranjit Mathew added a comment -

          Compressed trace for a WordCount Job needed by the unit-test introduced by the patch for this ticket. This
          needs to go into src/contrib/gridmix/src/test/data in the source-tree.

          Show
          Ranjit Mathew added a comment - Compressed trace for a WordCount Job needed by the unit-test introduced by the patch for this ticket. This needs to go into src/contrib/gridmix/src/test/data in the source-tree.
          Hide
          Ranjit Mathew added a comment -

          An updated version of the patch, now includes a unit-test. This depends on the compressed WordCount
          trace uploaded earlier (both need to be committed together).

          Show
          Ranjit Mathew added a comment - An updated version of the patch, now includes a unit-test. This depends on the compressed WordCount trace uploaded earlier (both need to be committed together).
          Hide
          Ranjit Mathew added a comment -

          The updated patch includes a unit-test that produces the following (snipped) results with "ant test":

          test:
          [echo] contrib: gridmix
          [delete] Deleting directory /home/ranjit/src/Hadoop/Apache/mapred_trunk/build/contrib/gridmix/test/logs
          [mkdir] Created dir: /home/ranjit/src/Hadoop/Apache/mapred_trunk/build/contrib/gridmix/test/logs
          [junit] WARNING: multiple versions of ant detected in path for junit
          [junit] jar:file:/home/ranjit/apps/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
          [junit] and jar:file:/home/ranjit/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
          [junit] Running org.apache.hadoop.mapred.gridmix.TestFilePool
          [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.294 sec
          [junit] Running org.apache.hadoop.mapred.gridmix.TestFileQueue
          [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.449 sec
          [junit] Running org.apache.hadoop.mapred.gridmix.TestGridmixRecord
          [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.262 sec
          [junit] Running org.apache.hadoop.mapred.gridmix.TestGridmixSubmission
          [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 414.348 sec
          [junit] Running org.apache.hadoop.mapred.gridmix.TestRandomAlgorithm
          [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.243 sec
          [junit] Running org.apache.hadoop.mapred.gridmix.TestRecordFactory
          [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.242 sec
          [junit] Running org.apache.hadoop.mapred.gridmix.TestSleepJob
          [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 570.155 sec
          [junit] Running org.apache.hadoop.mapred.gridmix.TestUserResolve
          [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.457 sec

          The relevant extract from TEST-org.apache.hadoop.mapred.gridmix.TestGridmixSubmission.txt:

          Verifying JobStory from compressed trace...
          Verifying JobStory from uncompressed trace...
          Verifying JobStory from trace in standard input...

          Show
          Ranjit Mathew added a comment - The updated patch includes a unit-test that produces the following (snipped) results with "ant test": test: [echo] contrib: gridmix [delete] Deleting directory /home/ranjit/src/Hadoop/Apache/mapred_trunk/build/contrib/gridmix/test/logs [mkdir] Created dir: /home/ranjit/src/Hadoop/Apache/mapred_trunk/build/contrib/gridmix/test/logs [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar: file:/home/ranjit/apps/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar: file:/home/ranjit/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.mapred.gridmix.TestFilePool [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.294 sec [junit] Running org.apache.hadoop.mapred.gridmix.TestFileQueue [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.449 sec [junit] Running org.apache.hadoop.mapred.gridmix.TestGridmixRecord [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.262 sec [junit] Running org.apache.hadoop.mapred.gridmix.TestGridmixSubmission [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 414.348 sec [junit] Running org.apache.hadoop.mapred.gridmix.TestRandomAlgorithm [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.243 sec [junit] Running org.apache.hadoop.mapred.gridmix.TestRecordFactory [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.242 sec [junit] Running org.apache.hadoop.mapred.gridmix.TestSleepJob [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 570.155 sec [junit] Running org.apache.hadoop.mapred.gridmix.TestUserResolve [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.457 sec The relevant extract from TEST-org.apache.hadoop.mapred.gridmix.TestGridmixSubmission.txt : Verifying JobStory from compressed trace... Verifying JobStory from uncompressed trace... Verifying JobStory from trace in standard input...
          Hide
          Vinay Kumar Thota added a comment -

          Reviewed the patch and over all it looks good. However, please make sure to add java doc information for test method before commit.
          +1

          Show
          Vinay Kumar Thota added a comment - Reviewed the patch and over all it looks good. However, please make sure to add java doc information for test method before commit. +1
          Hide
          Ravi Gummadi added a comment -

          Code changes look fine to me.

          Some minor comments on the testcase:
          (1) io.close() can be moved to finally block.
          (2) Also, please delete the contents of rootTempDir in the finally block.
          (3) Please add some javadoc to the testcase.
          (4) Please include the data file wordcount.json.gz in the patch so that committer need not have to remember the path where it is to be committed.

          Show
          Ravi Gummadi added a comment - Code changes look fine to me. Some minor comments on the testcase: (1) io.close() can be moved to finally block. (2) Also, please delete the contents of rootTempDir in the finally block. (3) Please add some javadoc to the testcase. (4) Please include the data file wordcount.json.gz in the patch so that committer need not have to remember the path where it is to be committed.
          Hide
          Ranjit Mathew added a comment -

          Updated version of the patch incorporating comments from Ravi and Vinay.

          Even with diff --text the patch does not contain the binary data file. I will have to coordinate with
          the committer to ensure that both the pieces are checked in.

          Show
          Ranjit Mathew added a comment - Updated version of the patch incorporating comments from Ravi and Vinay. Even with diff --text the patch does not contain the binary data file. I will have to coordinate with the committer to ensure that both the pieces are checked in.
          Hide
          Ravi Gummadi added a comment -

          Changes look good to me.

          Attaching patch merging with the gzipped data file. Patch is generated using "git diff --text --no-prefix HEAD".

          Show
          Ravi Gummadi added a comment - Changes look good to me. Attaching patch merging with the gzipped data file. Patch is generated using "git diff --text --no-prefix HEAD".
          Hide
          Amareshwari Sriramadasu added a comment -

          I just committed this. Thanks Ranjit !

          Show
          Amareshwari Sriramadasu added a comment - I just committed this. Thanks Ranjit !
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/ )

            People

            • Assignee:
              Ranjit Mathew
              Reporter:
              Vinay Kumar Thota
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development