Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2081

[GridMix3] Implement functionality for get the list of job traces which has different intervals.

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: contrib/gridmix
    • Labels:
      None
    • Tags:
      gridmix system-tests

      Description

      Girdmix system tests should require different job traces with different time intervals for generate and submit the gridmix jobs. So, implement a functionaliy for getting the job traces and arrange them in hash table with time interval as key.Also getting the list of traces from resource location irrespective of time. The following methods needs to implement.

      Method signature:
      public static Map <String, String> getMRTraces(Configuration conf) throws IOException; - it get the traces with time intervals from resources default location.

      public static Map <String, String> getMRTraces(Configuration conf,Path path) throws IOException; - it get the traces with time intervals from user specified resource location.

      public static List<String> listMRTraces(Configuration conf) throws IOException -it list all the traces from resource default location irrespective of time interval.

      public static List<String> listMRTraces(Configuration conf, Path tracesPath) throws IOException - it list all the traces from user specified user location irrespective of time interval.

      public static List<String> listMRTracesByTime(Configuration conf, String timeInterval) throws IOException - it list all traces of a given time interval from default resource location.

      public static List<String> listMRTracesByTime(Configuration conf, String timeInterval,Path path) throws IOException - it list all traces of a given time interval from a given resources location.

      1. MAPREDUCE-2081.patch
        558 kB
        Vinay Kumar Thota
      2. MAPREDUCE-2081.patch
        558 kB
        Vinay Kumar Thota
      3. 2081-ydist.patch
        33 kB
        Vinay Kumar Thota
      4. 2081-ydist.patch
        8 kB
        Vinay Kumar Thota
      5. 2081-ydist.patch
        7 kB
        Vinay Kumar Thota

        Issue Links

          Activity

          Gavin made changes -
          Link This issue depends upon MAPREDUCE-2095 [ MAPREDUCE-2095 ]
          Gavin made changes -
          Link This issue depends on MAPREDUCE-2095 [ MAPREDUCE-2095 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Amar Kamat made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.23.0 [ 12315570 ]
          Resolution Fixed [ 1 ]
          Tags gridmix system-tests
          Hide
          Amar Kamat added a comment -

          Committed as part of MAPREDUCE-2517.

          Show
          Amar Kamat added a comment - Committed as part of MAPREDUCE-2517 .
          Vinay Kumar Thota made changes -
          Link This issue is part of MAPREDUCE-2517 [ MAPREDUCE-2517 ]
          Vinay Kumar Thota made changes -
          Attachment MAPREDUCE-2081.patch [ 12460240 ]
          Hide
          Vinay Kumar Thota added a comment -

          Changed the package name for UtilsforGridmix class.

          Show
          Vinay Kumar Thota added a comment - Changed the package name for UtilsforGridmix class.
          Vinay Kumar Thota made changes -
          Attachment MAPREDUCE-2081.patch [ 12458146 ]
          Vinay Kumar Thota made changes -
          Attachment MAPREDUCE-2081.patch [ 12458131 ]
          Vinay Kumar Thota made changes -
          Attachment MAPREDUCE-2081.patch [ 12458131 ]
          Vinay Kumar Thota made changes -
          Link This issue depends on MAPREDUCE-2095 [ MAPREDUCE-2095 ]
          Vinay Kumar Thota made changes -
          Attachment 2081-ydist.patch [ 12456825 ]
          Hide
          Vinay Kumar Thota added a comment -

          Patch generated with compressed files.

          Show
          Vinay Kumar Thota added a comment - Patch generated with compressed files.
          Hide
          Ranjit Mathew added a comment -

          Fair enough. +1

          Show
          Ranjit Mathew added a comment - Fair enough. +1
          Hide
          Vinay Kumar Thota added a comment -

          If these methods are going to be called again and again, it might make sense to cache the mapping upon first invocation instead of re-creating it on each invocation. (Of course, this means that we don't expect or recognise new traces to be added to the folder between such invocations.)

          These methods won't call again and again. This can be called only once at class level.

          I think that the methods can be named a bit better to reflect their usage. I suggest getAllJobTraces() and getJobTraceForDuration() or something like that.

          I agreed your point. However I followed one of the hadoop existing method name convention like listFileStatus.

          Show
          Vinay Kumar Thota added a comment - If these methods are going to be called again and again, it might make sense to cache the mapping upon first invocation instead of re-creating it on each invocation. (Of course, this means that we don't expect or recognise new traces to be added to the folder between such invocations.) These methods won't call again and again. This can be called only once at class level. I think that the methods can be named a bit better to reflect their usage. I suggest getAllJobTraces() and getJobTraceForDuration() or something like that. I agreed your point. However I followed one of the hadoop existing method name convention like listFileStatus.
          Hide
          Ranjit Mathew added a comment -
          • If these methods are going to be called again and again, it might make sense to cache the mapping upon first invocation instead of re-creating it on each invocation. (Of course, this means that we don't expect or recognise new traces to be added to the folder between such invocations.)
          • In the current code the last file for a given duration (e.g. "10m") overwrites the previous such value, if any, and that's why the listMRTraces() methods can't re-use the Map used for the getMRTraces() methods. We can instead use a Map that maps a duration (e.g. "1d") to a list of trace-files found for that duration. The getMRTraces() method can return the first such file and the listMRTraces() method can return all such files.
          • I think that the methods can be named a bit better to reflect their usage. I suggest getAllJobTraces() and getJobTraceForDuration() or something like that.
          Show
          Ranjit Mathew added a comment - If these methods are going to be called again and again, it might make sense to cache the mapping upon first invocation instead of re-creating it on each invocation. (Of course, this means that we don't expect or recognise new traces to be added to the folder between such invocations.) In the current code the last file for a given duration (e.g. "10m") overwrites the previous such value, if any, and that's why the listMRTraces() methods can't re-use the Map used for the getMRTraces() methods. We can instead use a Map that maps a duration (e.g. "1d") to a list of trace-files found for that duration. The getMRTraces() method can return the first such file and the listMRTraces() method can return all such files. I think that the methods can be named a bit better to reflect their usage. I suggest getAllJobTraces() and getJobTraceForDuration() or something like that.
          Vinay Kumar Thota made changes -
          Attachment 2081-ydist.patch [ 12456278 ]
          Vinay Kumar Thota made changes -
          Attachment 2081-ydist.patch [ 12456367 ]
          Vinay Kumar Thota made changes -
          Attachment 2081-ydist.patch [ 12456278 ]
          Hide
          Vinay Kumar Thota added a comment -

          Addressed the comments.

          Show
          Vinay Kumar Thota added a comment - Addressed the comments.
          Hide
          Vinay Kumar Thota added a comment -

          In getMRTraces(), do you think it makes sense to validate that the text between "_" and "{d,h,m}.json.gz" is actually a number?

          It make sense, I need the number for creating the keys for each trace path.

          Is the consumer of these methods supposed to know the desired time-period up front? (As in Give me all traces that run for 10 minutes.)\

          Added functionality in the patch.

          I think the listMRTraces() methods are not needed as they can be subsumed by a call to Map.values() on the returned Map from the getMRTraces() methods.

          It method should require,because it fetch all the traces in the path irrespective of time interval.

          Show
          Vinay Kumar Thota added a comment - In getMRTraces(), do you think it makes sense to validate that the text between "_" and "{d,h,m}.json.gz" is actually a number? It make sense, I need the number for creating the keys for each trace path. Is the consumer of these methods supposed to know the desired time-period up front? (As in Give me all traces that run for 10 minutes.)\ Added functionality in the patch. I think the listMRTraces() methods are not needed as they can be subsumed by a call to Map.values() on the returned Map from the getMRTraces() methods. It method should require,because it fetch all the traces in the path irrespective of time interval.
          Vinay Kumar Thota made changes -
          Description Girdmix system tests should require different job traces with different time intervals for generate and submit the gridmix jobs. So, implement a functionaliy for getting the job traces and arrange them in hash table with time interval as key.Also getting the list of traces from resource location irrespective of time. The following methods needs to implement.

          Method signature:
          public static Hashtable <String, String> getMRTraces(Configuration conf) throws IOException; - it get the traces with time intervals from resources default location.

          public static Hashtable <String, String> getMRTraces(Configuration conf,Path path) throws IOException; - it get the traces with time intervals from user specified resource location.


          public static ArrayList<String> listMRTraces(Configuration conf) throws IOException -it list all the traces from resource default location irrespective of time interval.
          public static ArrayList<String> listMRTraces(Configuration conf, Path tracesPath) throws IOException - it list all the traces from user specified user location irrespective of time interval.
          Girdmix system tests should require different job traces with different time intervals for generate and submit the gridmix jobs. So, implement a functionaliy for getting the job traces and arrange them in hash table with time interval as key.Also getting the list of traces from resource location irrespective of time. The following methods needs to implement.

          Method signature:
          public static Map <String, String> getMRTraces(Configuration conf) throws IOException; - it get the traces with time intervals from resources default location.

          public static Map <String, String> getMRTraces(Configuration conf,Path path) throws IOException; - it get the traces with time intervals from user specified resource location.


          public static List<String> listMRTraces(Configuration conf) throws IOException -it list all the traces from resource default location irrespective of time interval.

          public static List<String> listMRTraces(Configuration conf, Path tracesPath) throws IOException - it list all the traces from user specified user location irrespective of time interval.

          public static List<String> listMRTracesByTime(Configuration conf, String timeInterval) throws IOException - it list all traces of a given time interval from default resource location.

          public static List<String> listMRTracesByTime(Configuration conf, String timeInterval,Path path) throws IOException - it list all traces of a given time interval from a given resources location.
          Hide
          Ranjit Mathew added a comment -
          • Use HashMap instead of Hashtable in the return types - better yet, use just Map (ditto for ArrayList -> List).
          • "It gives" -> "Gives" in the first sentence of the JavaDoc comments for the new methods.
          • You will need to change, for example, "<numeric>" to "&lt;numeric&gt;" in the JavaDoc comments for it to be rendered correctly in the generated HTML documentation.
          • Do not say There is no restriction on file and user can use their own names, since you do have a restriction.
          • In getMRTraces(), do you think it makes sense to validate that the text between "_" and " {d,h,m}

            .json.gz" is actually a number?

          • I think the listMRTraces() methods are not needed as they can be subsumed by a call to Map.values() on the returned Map from the getMRTraces() methods.
          • Is the consumer of these methods supposed to know the desired time-period up front? (As in Give me all traces that run for 10 minutes.)
          Show
          Ranjit Mathew added a comment - Use HashMap instead of Hashtable in the return types - better yet, use just Map (ditto for ArrayList -> List). "It gives" -> "Gives" in the first sentence of the JavaDoc comments for the new methods. You will need to change, for example, "<numeric>" to "&lt;numeric&gt;" in the JavaDoc comments for it to be rendered correctly in the generated HTML documentation. Do not say There is no restriction on file and user can use their own names , since you do have a restriction. In getMRTraces() , do you think it makes sense to validate that the text between "_" and " {d,h,m} .json.gz" is actually a number? I think the listMRTraces() methods are not needed as they can be subsumed by a call to Map.values() on the returned Map from the getMRTraces() methods. Is the consumer of these methods supposed to know the desired time-period up front? (As in Give me all traces that run for 10 minutes .)
          Vinay Kumar Thota made changes -
          Attachment 2081-ydist.patch [ 12456268 ]
          Hide
          Vinay Kumar Thota added a comment -

          Initial patch.

          Show
          Vinay Kumar Thota added a comment - Initial patch.
          Vinay Kumar Thota made changes -
          Field Original Value New Value
          Description Girdmix system tests should require different job traces with different time intervals for generate and submit the gridmix jobs. So, implement a method for getting the job traces and arrange them in hash table with time interval as key.

          Method signature:
          public static Hashtable <String, String> getJobTraces(Configuration conf) throws IOException;
          Girdmix system tests should require different job traces with different time intervals for generate and submit the gridmix jobs. So, implement a functionaliy for getting the job traces and arrange them in hash table with time interval as key.Also getting the list of traces from resource location irrespective of time. The following methods needs to implement.

          Method signature:
          public static Hashtable <String, String> getMRTraces(Configuration conf) throws IOException; - it get the traces with time intervals from resources default location.

          public static Hashtable <String, String> getMRTraces(Configuration conf,Path path) throws IOException; - it get the traces with time intervals from user specified resource location.


          public static ArrayList<String> listMRTraces(Configuration conf) throws IOException -it list all the traces from resource default location irrespective of time interval.
          public static ArrayList<String> listMRTraces(Configuration conf, Path tracesPath) throws IOException - it list all the traces from user specified user location irrespective of time interval.
          Vinay Kumar Thota created issue -

            People

            • Assignee:
              Vinay Kumar Thota
              Reporter:
              Vinay Kumar Thota
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development