Pig
  1. Pig
  2. PIG-271

Add tutorial files and builds to Pig SVN

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Corrine build a tutorial for Pig: http://wiki.apache.org/pig/PigTutorial.

      We should store the files in SVN and also add build targets to comstruct the tutorial

      1. PIG-271.patch
        277 kB
        Olga Natkovich
      2. PIG-271_v2.patch
        242 kB
        Olga Natkovich

        Activity

        Hide
        Olga Natkovich added a comment -

        This patch to to store all tutorial related code and data in SVN and also to build the tutorial from SVN via ANT

        Tutorial structure:

        A tutorial
        A tutorial/src
        A tutorial/src/org
        A tutorial/src/org/apache
        A tutorial/src/org/apache/pig
        A tutorial/src/org/apache/pig/tutorial
        A tutorial/src/org/apache/pig/tutorial/TutorialUtil.java
        A tutorial/src/org/apache/pig/tutorial/ScoreGenerator.java
        A tutorial/src/org/apache/pig/tutorial/NonPornDetector.java
        A tutorial/src/org/apache/pig/tutorial/TutorialTest.java
        A tutorial/src/org/apache/pig/tutorial/NonURLDetector.java
        A tutorial/src/org/apache/pig/tutorial/ExtractHour.java
        A tutorial/src/org/apache/pig/tutorial/NGramGenerator.java
        A tutorial/src/org/apache/pig/tutorial/ToLower.java
        A tutorial/scripts
        A tutorial/scripts/script1-hadoop.pig
        A tutorial/scripts/script1-local.pig
        A tutorial/scripts/script2-hadoop.pig
        A tutorial/scripts/script2-local.pig
        A tutorial/data
        A tutorial/data/excite-small.log
        A tutorial/data/pornwords
        A tutorial/data/excite.log.bz2
        A tutorial/build.xml

        Show
        Olga Natkovich added a comment - This patch to to store all tutorial related code and data in SVN and also to build the tutorial from SVN via ANT Tutorial structure: A tutorial A tutorial/src A tutorial/src/org A tutorial/src/org/apache A tutorial/src/org/apache/pig A tutorial/src/org/apache/pig/tutorial A tutorial/src/org/apache/pig/tutorial/TutorialUtil.java A tutorial/src/org/apache/pig/tutorial/ScoreGenerator.java A tutorial/src/org/apache/pig/tutorial/NonPornDetector.java A tutorial/src/org/apache/pig/tutorial/TutorialTest.java A tutorial/src/org/apache/pig/tutorial/NonURLDetector.java A tutorial/src/org/apache/pig/tutorial/ExtractHour.java A tutorial/src/org/apache/pig/tutorial/NGramGenerator.java A tutorial/src/org/apache/pig/tutorial/ToLower.java A tutorial/scripts A tutorial/scripts/script1-hadoop.pig A tutorial/scripts/script1-local.pig A tutorial/scripts/script2-hadoop.pig A tutorial/scripts/script2-local.pig A tutorial/data A tutorial/data/excite-small.log A tutorial/data/pornwords A tutorial/data/excite.log.bz2 A tutorial/build.xml
        Hide
        Benjamin Reed added a comment -

        Patch looks good technically. It seems pretty racy to be used as a tutorial example. My only critique is that 2 spaces are used to indent rather than the Pig standard of 4 spaces.

        Show
        Benjamin Reed added a comment - Patch looks good technically. It seems pretty racy to be used as a tutorial example. My only critique is that 2 spaces are used to indent rather than the Pig standard of 4 spaces.
        Hide
        Benjamin Reed added a comment -

        +1 on technical merits. Somebody else needs to approve the example content. It seems to racy to be used as a tutorial example to me.

        Show
        Benjamin Reed added a comment - +1 on technical merits. Somebody else needs to approve the example content. It seems to racy to be used as a tutorial example to me.
        Hide
        Christopher Olston added a comment -

        +1

        about the content: whether we like it or not the Excite query log contains porn-related queries. Hey, it's real data! I'd say that by filtering out porn words we are making the real data less racy.

        It's important to note that the user does not need to look at the content of the porn dictionary to use/understand the tutorial.

        Show
        Christopher Olston added a comment - +1 about the content: whether we like it or not the Excite query log contains porn-related queries. Hey, it's real data! I'd say that by filtering out porn words we are making the real data less racy. It's important to note that the user does not need to look at the content of the porn dictionary to use/understand the tutorial.
        Hide
        Alan Gates added a comment -

        I'm with Ben on this one. I think we should filter the porn words out of the data before we make it part of the patch and drop the porn dictionary. We can make an example of filtering on something else. Yeah, it's real data, and yeah we all get spam with worse every day. But in a tutorial we want to put our best foot (hoof?) forward. And including a bunch of porn words (in the data and in the filter dictionary) doesn't do that.

        Show
        Alan Gates added a comment - I'm with Ben on this one. I think we should filter the porn words out of the data before we make it part of the patch and drop the porn dictionary. We can make an example of filtering on something else. Yeah, it's real data, and yeah we all get spam with worse every day. But in a tutorial we want to put our best foot (hoof?) forward. And including a bunch of porn words (in the data and in the filter dictionary) doesn't do that.
        Hide
        Olga Natkovich added a comment -

        I looked at the scripts and they already have filters and UDFs even if porn processing is removed. The only thing that would be missing is loading data from DFS into UDF which might not be that important for the first encounter with Pig.

        I am planning to do the following unless I hear any complains:

        • prefilter the data using pornfilter and replace the original data in the tutorial with cleaned one
        • remove pornwords and porn filter from the tutorial.

        Comments?

        Show
        Olga Natkovich added a comment - I looked at the scripts and they already have filters and UDFs even if porn processing is removed. The only thing that would be missing is loading data from DFS into UDF which might not be that important for the first encounter with Pig. I am planning to do the following unless I hear any complains: prefilter the data using pornfilter and replace the original data in the tutorial with cleaned one remove pornwords and porn filter from the tutorial. Comments?
        Hide
        Olga Natkovich added a comment -

        New patch that with clean data and modified scripts

        Show
        Olga Natkovich added a comment - New patch that with clean data and modified scripts
        Hide
        Olga Natkovich added a comment -

        I have uploaded a new patch. Please, review. If I don't hear any objections, I will commit it tomorrow morining PST.

        Show
        Olga Natkovich added a comment - I have uploaded a new patch. Please, review. If I don't hear any objections, I will commit it tomorrow morining PST.
        Hide
        Olga Natkovich added a comment -

        I committed the tutorial.

        Show
        Olga Natkovich added a comment - I committed the tutorial.

          People

          • Assignee:
            Olga Natkovich
            Reporter:
            Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development