Pig
  1. Pig
  2. PIG-3215

[piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: piggybank
    • Labels:
    • Patch Info:
      Patch Available

      Description

      LTSV, or Labeled Tab-separated Values format is now getting popular in Japan for log files, especially of web servers. The goal of this jira is to add LTSVLoader in PiggyBank to load LTSV files.

      LTSV is based on TSV thus columns are separated by tab characters. Additionally each of columns includes a label and a value, separated by ":" character.

      Read about LTSV on http://ltsv.org/.

      Example LTSV file (access.log)

      Columns are separated by tab characters.

      host:host1.example.org	req:GET /index.html	ua:Opera/9.80
      host:host1.example.org	req:GET /favicon.ico	ua:Opera/9.80
      host:pc.example.com	req:GET /news.html	ua:Mozilla/5.0
      

      Usage 1: Extract fields from each line

      Users can specify an input schema and get columns as Pig fields.

      This example loads the LTSV file shown in the previous section.

      -- Parses the access log and count the number of lines
      -- for each pair of the host column and the ua column.
      access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray');
      grouped_access = GROUP access BY (host, ua);
      count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, COUNT(access);
      DUMP count_for_host_ua;
      

      The below text will be printed out.

      (host1.example.org,Opera/9.80,2)
      (pc.example.com,Firefox/5.0,1)
      

      Usage 2: Extract a map from each line

      Users can get a map for each LTSV line. The key of a map is a label of the LTSV column. The value of a map comes from characters after ":" in the LTSV column.

      -- Parses the access log and projects the user agent field.
      access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]);
      user_agent = FOREACH access GENERATE m#'ua' AS ua;
      DUMP user_agent;
      

      The below text will be printed out.

      (Opera/9.80)
      (Opera/9.80)
      (Firefox/5.0)
      
      1. LTSVLoader-6.html
        30 kB
        MIYAKAWA Taku
      2. PIG-3215-6.patch
        53 kB
        MIYAKAWA Taku
      3. LTSVLoader.html
        30 kB
        MIYAKAWA Taku
      4. PIG-3215.patch
        46 kB
        MIYAKAWA Taku

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          7m 8s 1 MIYAKAWA Taku 24/Feb/13 09:05
          Patch Available Patch Available Open Open
          51d 8h 9m 1 Alan Gates 16/Apr/13 18:14
          Hide
          Jonathan Coveney added a comment -

          Do it!

          Show
          Jonathan Coveney added a comment - Do it!
          Hide
          MIYAKAWA Taku added a comment -

          Jonathan, sorry for the delay. As long as I see, you have not made a post about the issue at dev@pig.apache.org. May I make a post instead of you?

          Show
          MIYAKAWA Taku added a comment - Jonathan, sorry for the delay. As long as I see, you have not made a post about the issue at dev@pig.apache.org. May I make a post instead of you?
          Alan Gates made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Jonathan Coveney added a comment -

          Updated the review. Basically, we should figure out how we can get a handle on the Schema info, because it irks me to have to subvert the syntax because of that issue. I'd ask on dev@.

          Show
          Jonathan Coveney added a comment - Updated the review. Basically, we should figure out how we can get a handle on the Schema info, because it irks me to have to subvert the syntax because of that issue. I'd ask on dev@.
          Hide
          Jonathan Coveney added a comment -

          Sorry for the delay, was out of town. Will try to review in the next couple of days.

          Show
          Jonathan Coveney added a comment - Sorry for the delay, was out of town. Will try to review in the next couple of days.
          Hide
          MIYAKAWA Taku added a comment -

          Updated the patch.

          Fixed all issues on the RB except one about passing the schema to the constructor, which is still under discussion. Could you check my comment on the RB and mark it as fixed if no problem?

          Show
          MIYAKAWA Taku added a comment - Updated the patch. Fixed all issues on the RB except one about passing the schema to the constructor, which is still under discussion. Could you check my comment on the RB and mark it as fixed if no problem?
          MIYAKAWA Taku made changes -
          Attachment PIG-3215-6.patch [ 12572901 ]
          Attachment LTSVLoader-6.html [ 12572902 ]
          Hide
          Jonathan Coveney added a comment -

          Made some comments on the RB

          Show
          Jonathan Coveney added a comment - Made some comments on the RB
          Hide
          MIYAKAWA Taku added a comment -

          Posted in the review board: https://reviews.apache.org/r/9685/

          Show
          MIYAKAWA Taku added a comment - Posted in the review board: https://reviews.apache.org/r/9685/
          MIYAKAWA Taku made changes -
          Remote Link この課題は "Review board for the patch (Web リンク)" にリンクしています [ 12050 ]
          Rohini Palaniswamy made changes -
          Assignee MIYAKAWA Taku [ miyakawataku ]
          Hide
          Rohini Palaniswamy added a comment -

          Can you upload this patch to review board since this is a slightly bigger patch?

          Show
          Rohini Palaniswamy added a comment - Can you upload this patch to review board since this is a slightly bigger patch?
          MIYAKAWA Taku made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          MIYAKAWA Taku made changes -
          Field Original Value New Value
          Attachment PIG-3215.patch [ 12570645 ]
          Attachment LTSVLoader.html [ 12570646 ]
          MIYAKAWA Taku created issue -

            People

            • Assignee:
              MIYAKAWA Taku
              Reporter:
              MIYAKAWA Taku
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development