Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: Data Processors
    • Labels:
      None
    1. CHUKWA-20.patch
      63 kB
      Jerome Boulon
    2. CHUKWA-20-1.patch
      67 kB
      Jerome Boulon
    3. CHUKWA-20-2.patch
      77 kB
      Jerome Boulon
    4. pig.jar
      6.06 MB
      Jerome Boulon
    5. pig-test.jar
      532 kB
      Jerome Boulon

      Activity

      Hide
      jboulon Jerome Boulon added a comment -

      Pig support implies:

      • read data for chukwa repository (chukwaPigLoader)
      • UDF to remove duplicates
      • Tests cases
      • example
      • write data back to chukwa repository using the same format as demux (chukwaPigWriter)

      Note: writing data back to chukwa repo is not mandatory for taking advantage of pig

      Show
      jboulon Jerome Boulon added a comment - Pig support implies: read data for chukwa repository (chukwaPigLoader) UDF to remove duplicates Tests cases example write data back to chukwa repository using the same format as demux (chukwaPigWriter) Note: writing data back to chukwa repo is not mandatory for taking advantage of pig
      Hide
      jboulon Jerome Boulon added a comment -

      Pig support for Chukwa

      • Loader
      • Writer
      • RecordMerger (UDF)
      • TimePartition (UDF)
      Show
      jboulon Jerome Boulon added a comment - Pig support for Chukwa Loader Writer RecordMerger (UDF) TimePartition (UDF)
      Hide
      jboulon Jerome Boulon added a comment -

      pig.jar need to be in contrib/pig/lib/

      Show
      jboulon Jerome Boulon added a comment - pig.jar need to be in contrib/pig/lib/
      Hide
      jboulon Jerome Boulon added a comment -

      pig-test.jar need to be in contrib/pig/lib/

      Show
      jboulon Jerome Boulon added a comment - pig-test.jar need to be in contrib/pig/lib/
      Hide
      jboulon Jerome Boulon added a comment - - edited

      Add Pig support for chukwa. All current code is working for both Pig.2.0 public release and latest version from trunk.
      chukwa-core-*.jar need to be available in the default chukwa build directory

      Show
      jboulon Jerome Boulon added a comment - - edited Add Pig support for chukwa. All current code is working for both Pig.2.0 public release and latest version from trunk. chukwa-core-*.jar need to be available in the default chukwa build directory
      Hide
      eyang Eric Yang added a comment -

      org.apache.hadoop.chukwa.ChukwaStorage does not seem to be the right name for the chukwa pig storage. It would be better to name this as org.apache.hadoop.chukwa.PigStorage in my opinion. org.apache.hadoop.chukwa.util.Util class should be named as org.apache.hadoop.chukwa.util.FileUtil class.

      Show
      eyang Eric Yang added a comment - org.apache.hadoop.chukwa.ChukwaStorage does not seem to be the right name for the chukwa pig storage. It would be better to name this as org.apache.hadoop.chukwa.PigStorage in my opinion. org.apache.hadoop.chukwa.util.Util class should be named as org.apache.hadoop.chukwa.util.FileUtil class.
      Hide
      jboulon Jerome Boulon added a comment -

      PigStorage is an existing class and that's the default storage for pig.

      Show
      jboulon Jerome Boulon added a comment - PigStorage is an existing class and that's the default storage for pig.
      Hide
      jboulon Jerome Boulon added a comment -

      Cancel patch since I need to add one more utility class for CHUKWA-253

      Show
      jboulon Jerome Boulon added a comment - Cancel patch since I need to add one more utility class for CHUKWA-253
      Hide
      zhangyongjiang Cheng added a comment -

      I saw many commented "System.out.println..." statements. It's not a big deal but I personally prefer that we either remove them or replace them with logger.debug. The same as e.printStackTrace.

      Show
      zhangyongjiang Cheng added a comment - I saw many commented "System.out.println..." statements. It's not a big deal but I personally prefer that we either remove them or replace them with logger.debug. The same as e.printStackTrace.
      Hide
      jboulon Jerome Boulon added a comment -

      Storage:
      + Add ability to add cluster information to SeqFile
      + Use DataByteArray instead of string to avoid unnecessary pig conversion

      PigMover: helper class to move pig output in a format similar to demux output so we can reuse the default PostProcessorManager

      Show
      jboulon Jerome Boulon added a comment - Storage: + Add ability to add cluster information to SeqFile + Use DataByteArray instead of string to avoid unnecessary pig conversion PigMover: helper class to move pig output in a format similar to demux output so we can reuse the default PostProcessorManager
      Hide
      eyang Eric Yang added a comment -

      It looks like the current build system does not package pig as part of the package. Could you add packaging for chukwa-pig? Thanks

      Show
      eyang Eric Yang added a comment - It looks like the current build system does not package pig as part of the package. Could you add packaging for chukwa-pig? Thanks
      Hide
      eyang Eric Yang added a comment -

      Is there duplicate detection UDF in the supplied patch? I can't find it.

      The build.xml in contrib/pig has problems. It does not honor -Dtestcase, -Dtest.include, -Dtest.exclude flags. During ant test -v, it shows:

      dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/$

      {build.classes} from path as it doesn't exist
      dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/${test.build.classes} from path as it doesn't exist
      dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/${build.classes}

      from path as it doesn't exist
      dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/$

      {test.build.classes} from path as it doesn't exist
      dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/${build.classes} from path as it doesn't exist
      dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/${test.build.classes}

      from path as it doesn't exist
      dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/$

      {build.classes}

      from path as it doesn't exist
      dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/$

      {test.build.classes}

      from path as it doesn't exist

      Looks like the environment is not setup correctly neither. Please fix those issues and packaging issues. Thanks

      Show
      eyang Eric Yang added a comment - Is there duplicate detection UDF in the supplied patch? I can't find it. The build.xml in contrib/pig has problems. It does not honor -Dtestcase, -Dtest.include, -Dtest.exclude flags. During ant test -v, it shows: dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/$ {build.classes} from path as it doesn't exist dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/${test.build.classes} from path as it doesn't exist dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/${build.classes} from path as it doesn't exist dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/$ {test.build.classes} from path as it doesn't exist dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/${build.classes} from path as it doesn't exist dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/${test.build.classes} from path as it doesn't exist dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/$ {build.classes} from path as it doesn't exist dropping /Users/eyang/sandbox/svn/chukwa/trunk/contrib/pig/$ {test.build.classes} from path as it doesn't exist Looks like the environment is not setup correctly neither. Please fix those issues and packaging issues. Thanks
      Hide
      eyang Eric Yang added a comment -

      Cancel patch for stuff that requires refinement.

      Show
      eyang Eric Yang added a comment - Cancel patch for stuff that requires refinement.
      Hide
      jboulon Jerome Boulon added a comment -
      • automatically build chukwa-pig
      • add chukwa-pig to the package target
      • able to compute large timePartition
      • RecordMerger now able to work on DataByteArray
      • ParseDouble now able to work on DataByteArray
      Show
      jboulon Jerome Boulon added a comment - automatically build chukwa-pig add chukwa-pig to the package target able to compute large timePartition RecordMerger now able to work on DataByteArray ParseDouble now able to work on DataByteArray
      Hide
      jboulon Jerome Boulon added a comment -

      + able to run ant test -Dtestcase=TestTimePartition + include/exlude

      Show
      jboulon Jerome Boulon added a comment - + able to run ant test -Dtestcase=TestTimePartition + include/exlude
      Hide
      eyang Eric Yang added a comment -

      +1 Looks good.

      Show
      eyang Eric Yang added a comment - +1 Looks good.
      Hide
      eyang Eric Yang added a comment -

      I just committed this, thanks Jerome.

      Show
      eyang Eric Yang added a comment - I just committed this, thanks Jerome.
      Hide
      hudson Hudson added a comment -

      Integrated in Chukwa-trunk #49 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/49/)
      . Added pig support for ChukwaRecords. (Jerome Boulon via Eric Yang)
      . Added pig support for ChukwaRecords. (Jerome Boulon via Eric Yang)

      Show
      hudson Hudson added a comment - Integrated in Chukwa-trunk #49 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/49/ ) . Added pig support for ChukwaRecords. (Jerome Boulon via Eric Yang) . Added pig support for ChukwaRecords. (Jerome Boulon via Eric Yang)

        People

        • Assignee:
          jboulon Jerome Boulon
          Reporter:
          jboulon Jerome Boulon
        • Votes:
          0 Vote for this issue
          Watchers:
          3 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development