Uploaded image for project: 'Chukwa'
  1. Chukwa
  2. CHUKWA-449

Create utility to generate a sequence file from a log file

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: Data Processors
    • Labels:
      None
    • Release Note:
      Added new utility for creating Chukwa sequence files in development.

      Description

      See this thread:
      http://www.mail-archive.com/chukwa-user%40hadoop.apache.org/msg00084.html

      We should have a utility class that can generate a Chukwa sequence file from a raw log file.

      1. CHUKWA-449-1.patch
        6 kB
        Bill Graham
      2. CHUKWA-449-2.patch
        12 kB
        Bill Graham

        Issue Links

          Activity

          Hide
          billgraham Bill Graham added a comment -

          Attaching CHUKWA-449-1.patch.

          I've added a new method to TempFileUtil:

          public static void makeTestSequenceFile(File inputFile,
                                                  Path outputFile,
                                                  String clusterName,
                                                  String dataType,
                                                  String streamName,
                                                  MapProcessor processor) throws IOException
          

          I've also included a main method, with the following usage message:

          Usage: java org.apache.hadoop.chukwa.util.TempFileUtil <inputFile> <outputFile> [clusterName] [dataType] [streamName] [processorClass]
          Description: Takes a plain text input file and generates a Hadoop sequence
                       file contaning ChukwaRecordKey,ChukwaRecord entries
          Parameters: inputFile      - Text input file to read
                      outputFile     - Where to write the sequence file
                      clusterName    - Cluster name to use in the records
                      dataType       - Data type to use in the records
                      streamName     - Stream name to use in the records
                      processorClass - Processor class to use. Defaults to TsProcessor
          

          I wasn't sure where to put this code, so let me know if there's a better home for it. Also, since this is just a static helper utility there isn't a unit test.

          Show
          billgraham Bill Graham added a comment - Attaching CHUKWA-449 -1.patch. I've added a new method to TempFileUtil: public static void makeTestSequenceFile(File inputFile, Path outputFile, String clusterName, String dataType, String streamName, MapProcessor processor) throws IOException I've also included a main method, with the following usage message: Usage: java org.apache.hadoop.chukwa.util.TempFileUtil <inputFile> <outputFile> [clusterName] [dataType] [streamName] [processorClass] Description: Takes a plain text input file and generates a Hadoop sequence file contaning ChukwaRecordKey,ChukwaRecord entries Parameters: inputFile - Text input file to read outputFile - Where to write the sequence file clusterName - Cluster name to use in the records dataType - Data type to use in the records streamName - Stream name to use in the records processorClass - Processor class to use. Defaults to TsProcessor I wasn't sure where to put this code, so let me know if there's a better home for it. Also, since this is just a static helper utility there isn't a unit test.
          Hide
          asrabkin Ari Rabkin added a comment -

          Code looks good. I think it should live in a new file in org.apache.hadoop.chukwa.util, rather than in TempFileUtil. Also, I would give it a name to indicate that it's a seq file of Records, not of raw Chunks. Something like CreateRecordFile

          Show
          asrabkin Ari Rabkin added a comment - Code looks good. I think it should live in a new file in org.apache.hadoop.chukwa.util, rather than in TempFileUtil. Also, I would give it a name to indicate that it's a seq file of Records, not of raw Chunks. Something like CreateRecordFile
          Hide
          billgraham Bill Graham added a comment -

          Attaching CHUKWA-449-2.patch.

          I've moved the code to org.apache.hadoop.chukwa.util.CreateRecordFile and added a unit test. The test reads test/samples/ClientTrace.log and writes a SequenceFile to disk. I then read the sequence file and assert the entries against the original.

          Show
          billgraham Bill Graham added a comment - Attaching CHUKWA-449 -2.patch. I've moved the code to org.apache.hadoop.chukwa.util.CreateRecordFile and added a unit test. The test reads test/samples/ClientTrace.log and writes a SequenceFile to disk. I then read the sequence file and assert the entries against the original.
          Hide
          asrabkin Ari Rabkin added a comment -

          I just committed this. Thanks, Bill!

          Show
          asrabkin Ari Rabkin added a comment - I just committed this. Thanks, Bill!
          Hide
          hudson Hudson added a comment -
          Show
          hudson Hudson added a comment - Integrated in Chukwa-trunk #330 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/330/ )

            People

            • Assignee:
              billgraham Bill Graham
              Reporter:
              billgraham Bill Graham
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development