Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4175

PIG CROSS operation follow by STORE produces non-deterministic results each run

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.11, 0.12.0
    • 0.14.0
    • None
    • None
    • RHEL 6/64-bit

    • Reviewed

    Description

      Three files will be attached to help visualize this issue.

      1. mktestdata.py - to generate test data to feed the pig script
      2. test_cross.pig - the PIG script using CROSS and STORE
      3. test_cross.out - the PIG console output showing the input/output records delta

      To reproduce this PIG CROSS operation problem, you need to use the supplied Python script,
      mktestdata.py, to generate an input file that is at least 13,948,228,930 bytes (> 13GB).

      The CROSS between raw_data (m records) and cross_count (1 record) should yield exactly (m records) as the output.
      The STORE results from the CROSS operations yielded about 1/3 of input record in raw_data as the output.

      If I joined the both of the CROSS operations together, the STORE results from the CROSS operations yielded about 2/3
      of the input records in raw-data as the output.
      – data = CROSS raw_data, field04s_count, subsection1_field04s_count, subsection2_field04s_count;

      We have reproduced this using both Pig 0.11 (Hadoop 1.x) and Pig 0.12 (Hadoop 2.x) clusters.
      The default HDFS block size is 128MB.

      Attachments

        1. mktestdata.py
          0.6 kB
          Jim Huang
        2. test_cross.pig
          3 kB
          Jim Huang
        3. test_cross.out
          2 kB
          Jim Huang
        4. pig_testcross_plan.png
          194 kB
          Jim Huang
        5. PIG-4175-1.patch
          8 kB
          Daniel Dai
        6. PIG-4175-Debug.patch
          3 kB
          Rohini Palaniswamy
        7. PIG-4175-additional-1.patch
          10 kB
          Rohini Palaniswamy

        Activity

          People

            daijy Daniel Dai
            jimhuang Jim Huang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: