Pig
  1. Pig
  2. PIG-3749

PigPerformance - data in the map gets lost during parsing

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Bug in PigPerformanceLoader when reading bytes, the loop which looks for a termination character in a map is missing the null value (Ascii=0)

      Description

      Create a Pigmix sample dataset which looks as follow:
      keren 1 2 qt 3 4 5.0 aaaabbbb mccccddddeeeedmffffgggghhhh

      Launch the following query:
      A = load 'page_views_sample.txt' using org.apache.pig.test.pigmix.udf.PigPerformanceLoader()
      as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links);
      store A into 'L1out_A';

      B = foreach A generate user, (int)action as action, (map[])page_info as page_info, flatten((bag

      {tuple(map[])}

      )page_links) as page_links;
      store B into 'L1out_B';

      The result looks like this:
      keren 1 b#bbb,a#aaa d#,e#eee,c#ccc
      keren 1 b#bbb,a#aaa [f#fff,g#ggg,h#hhh

      It is missing the 'ddd' value and a closing bracket.

      Thanks,
      Keren

      1. PIG-3749.patch
        0.5 kB
        Keren Ouaknine

        Activity

        Keren Ouaknine created issue -
        Keren Ouaknine made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Release Note Bug in PigPerformanceLoader when reading bytes, the loop which looks for a termination character in a map is missing the null value (Ascii=0)
        Fix Version/s 0.12.1 [ 12324970 ]
        Keren Ouaknine made changes -
        Attachment patch.txt [ 12636827 ]
        Keren Ouaknine made changes -
        Attachment patch.txt [ 12636827 ]
        Keren Ouaknine made changes -
        Attachment PIG-3749.patch [ 12636828 ]
        Daniel Dai made changes -
        Assignee Keren Ouaknine [ kereno ]
        Hide
        Cheolsoo Park added a comment -

        I don't seem to be able to reproduce it. I used "keren 1 2 qt 3 4 5.0 aaaabbbb mccccddddeeeedmffffgggghhhh" as input, and it gives me the following-

        (keren	1	2	qt	3	4	5.0	aaaabbbb	mccccddddeeeemffffgggghhhh,,,,,,,,)
        (keren	1	2	qt	3	4	5.0	aaaabbbb	mccccddddeeeemffffgggghhhh,,,)
        

        I think I am not loading the data properly. Do you mind attaching a sample dataset to the jira?

        Also, can you post a patch that can be easily applied with patch < filenamename in the root directory? Not a big deal for small patches, but it's helpful to reviewers.

        Thanks!

        Show
        Cheolsoo Park added a comment - I don't seem to be able to reproduce it. I used "keren 1 2 qt 3 4 5.0 aaaabbbb mccccddddeeeedmffffgggghhhh" as input, and it gives me the following- (keren 1 2 qt 3 4 5.0 aaaabbbb mccccddddeeeemffffgggghhhh,,,,,,,,) (keren 1 2 qt 3 4 5.0 aaaabbbb mccccddddeeeemffffgggghhhh,,,) I think I am not loading the data properly. Do you mind attaching a sample dataset to the jira? Also, can you post a patch that can be easily applied with patch < filenamename in the root directory? Not a big deal for small patches, but it's helpful to reviewers. Thanks!
        Prashant Kommireddi made changes -
        Fix Version/s 0.13.0 [ 12324971 ]
        Fix Version/s 0.12.1 [ 12324970 ]
        Hide
        Prashant Kommireddi added a comment -

        Keren Ouaknine moving this to 0.13, let me know if you have concerns with that. Also, can you please answer Cheolsoo's question above.

        Show
        Prashant Kommireddi added a comment - Keren Ouaknine moving this to 0.13, let me know if you have concerns with that. Also, can you please answer Cheolsoo's question above.
        Hide
        Cheolsoo Park added a comment -

        Canceling patch while waiting for response.

        Show
        Cheolsoo Park added a comment - Canceling patch while waiting for response.
        Cheolsoo Park made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Aniket Mokashi made changes -
        Fix Version/s 0.14.0 [ 12326954 ]
        Fix Version/s 0.13.0 [ 12324971 ]
        Hide
        Daniel Dai added a comment -

        Keren Ouaknine, is this still an issue?

        Show
        Daniel Dai added a comment - Keren Ouaknine , is this still an issue?
        Hide
        Daniel Dai added a comment -

        I tried something similar but not able to reproduce it.

        Seems your patch deals with the 0x00 in the bytearray. Is it in the middle of the bytearray or in the end? I checked DataGenerator, it does not seems we generate 0x00 in the middle. If it is in the end, shouldn't it also be bounded by b.length?

        Can you upload your page_views_sample with the offending record?

        Show
        Daniel Dai added a comment - I tried something similar but not able to reproduce it. Seems your patch deals with the 0x00 in the bytearray. Is it in the middle of the bytearray or in the end? I checked DataGenerator, it does not seems we generate 0x00 in the middle. If it is in the end, shouldn't it also be bounded by b.length? Can you upload your page_views_sample with the offending record?
        Daniel Dai made changes -
        Fix Version/s 0.15.0 [ 12328760 ]
        Fix Version/s 0.14.0 [ 12326954 ]

          People

          • Assignee:
            Keren Ouaknine
            Reporter:
            Keren Ouaknine
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development