Hive
  1. Hive
  2. HIVE-1088

RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Labels:
      None
    1. hive-rcfile-reader-branch-0.5.patch
      4 kB
      He Yongqiang
    2. hive-rcfile-reader-trunk.patch
      3 kB
      He Yongqiang
    3. hive-rcfile-reader-trunk.2.patch
      6 kB
      He Yongqiang
    4. hive-1088-branch0.5-2010-1-25.patch
      32 kB
      He Yongqiang
    5. hive-1088-trunk-2010-1-25.patch
      8 kB
      He Yongqiang
    6. hive-1088-branch0.5-2010-1-25.2.patch
      7 kB
      He Yongqiang
    7. hive-1088-trunk-2010-1-25.2.patch
      7 kB
      He Yongqiang

      Activity

      Hide
      Namit Jain added a comment -

      Can you add a test for this ?

      Also, do you think it is a good idea to convert a subset of tests to use rcfile ?

      Show
      Namit Jain added a comment - Can you add a test for this ? Also, do you think it is a good idea to convert a subset of tests to use rcfile ?
      Hide
      He Yongqiang added a comment -

      Will add a testcase.

      >>Also, do you think it is a good idea to convert a subset of tests to use rcfile ?
      Yes. We may need to do this soon or later, and right now is a good time. It maybe better if we do this in a separate jira.
      What others think?

      Show
      He Yongqiang added a comment - Will add a testcase. >>Also, do you think it is a good idea to convert a subset of tests to use rcfile ? Yes. We may need to do this soon or later, and right now is a good time. It maybe better if we do this in a separate jira. What others think?
      Hide
      He Yongqiang added a comment -

      Added a testcase in the patch for trunk. (hive-rcfile-reader-trunk.2.patch)

      Show
      He Yongqiang added a comment - Added a testcase in the patch for trunk. (hive-rcfile-reader-trunk.2.patch)
      Hide
      Ning Zhang added a comment -

      Yongqiang, I noticed you have added a unit test in TestRCFile.java. Is it possible to add a unit test as .q file? The benefit of doing this is that TestRCFile just test one code path for particular functions. If we change the code path later in the real execution engine, we may not cover the error case, but a .q file will be more likely to catch the caes.

      Show
      Ning Zhang added a comment - Yongqiang, I noticed you have added a unit test in TestRCFile.java. Is it possible to add a unit test as .q file? The benefit of doing this is that TestRCFile just test one code path for particular functions. If we change the code path later in the real execution engine, we may not cover the error case, but a .q file will be more likely to catch the caes.
      Hide
      Ning Zhang added a comment -

      Another suggestion: in RCFileRecordReader.java, next(LongWritable, BytesRefArrayWritable) shares many common code with next(LongWritable). Can you refactor this function to something like:

      public boolean next(LongWritable key, ByteRefArrayWritable value)
      throws IOException {
      more = next(key);
      if ( more )

      { in.getCurrentRow(value); }

      return more;
      }

      This will always keep the logic consistent for the two next() functions.

      Show
      Ning Zhang added a comment - Another suggestion: in RCFileRecordReader.java, next(LongWritable, BytesRefArrayWritable) shares many common code with next(LongWritable). Can you refactor this function to something like: public boolean next(LongWritable key, ByteRefArrayWritable value) throws IOException { more = next(key); if ( more ) { in.getCurrentRow(value); } return more; } This will always keep the logic consistent for the two next() functions.
      Hide
      He Yongqiang added a comment -

      Attached 2 patches. hive-1088-branch0.5-2010-1-25.patch is for branch 0.5, and hive-1088-trunk-2010-1-25.patch is for trunk code.

      Show
      He Yongqiang added a comment - Attached 2 patches. hive-1088-branch0.5-2010-1-25.patch is for branch 0.5, and hive-1088-trunk-2010-1-25.patch is for trunk code.
      Hide
      Ning Zhang added a comment -

      +1. Looks good. Will commit if tests pass.

      Show
      Ning Zhang added a comment - +1. Looks good. Will commit if tests pass.
      Hide
      Ning Zhang added a comment -

      Committed to 0.5.0 and trunk (0.6.0). Thanks Yongqiang!

      Show
      Ning Zhang added a comment - Committed to 0.5.0 and trunk (0.6.0). Thanks Yongqiang!

        People

        • Assignee:
          He Yongqiang
          Reporter:
          He Yongqiang
        • Votes:
          0 Vote for this issue
          Watchers:
          0 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development