Hive
  1. Hive
  2. HIVE-1088

RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Labels:
      None
    1. hive-1088-branch0.5-2010-1-25.2.patch
      7 kB
      He Yongqiang
    2. hive-1088-branch0.5-2010-1-25.patch
      32 kB
      He Yongqiang
    3. hive-1088-trunk-2010-1-25.2.patch
      7 kB
      He Yongqiang
    4. hive-1088-trunk-2010-1-25.patch
      8 kB
      He Yongqiang
    5. hive-rcfile-reader-branch-0.5.patch
      4 kB
      He Yongqiang
    6. hive-rcfile-reader-trunk.2.patch
      6 kB
      He Yongqiang
    7. hive-rcfile-reader-trunk.patch
      3 kB
      He Yongqiang

      Activity

      He Yongqiang created issue -
      He Yongqiang made changes -
      Field Original Value New Value
      Attachment hive-rcfile-reader-branch-0.5.patch [ 12431220 ]
      Attachment hive-rcfile-reader-trunk.patch [ 12431221 ]
      Namit Jain made changes -
      Assignee He Yongqiang [ he yongqiang ]
      Hide
      Namit Jain added a comment -

      Can you add a test for this ?

      Also, do you think it is a good idea to convert a subset of tests to use rcfile ?

      Show
      Namit Jain added a comment - Can you add a test for this ? Also, do you think it is a good idea to convert a subset of tests to use rcfile ?
      Hide
      He Yongqiang added a comment -

      Will add a testcase.

      >>Also, do you think it is a good idea to convert a subset of tests to use rcfile ?
      Yes. We may need to do this soon or later, and right now is a good time. It maybe better if we do this in a separate jira.
      What others think?

      Show
      He Yongqiang added a comment - Will add a testcase. >>Also, do you think it is a good idea to convert a subset of tests to use rcfile ? Yes. We may need to do this soon or later, and right now is a good time. It maybe better if we do this in a separate jira. What others think?
      Hide
      He Yongqiang added a comment -

      Added a testcase in the patch for trunk. (hive-rcfile-reader-trunk.2.patch)

      Show
      He Yongqiang added a comment - Added a testcase in the patch for trunk. (hive-rcfile-reader-trunk.2.patch)
      He Yongqiang made changes -
      Attachment hive-rcfile-reader-trunk.2.patch [ 12431298 ]
      Hide
      Ning Zhang added a comment -

      Yongqiang, I noticed you have added a unit test in TestRCFile.java. Is it possible to add a unit test as .q file? The benefit of doing this is that TestRCFile just test one code path for particular functions. If we change the code path later in the real execution engine, we may not cover the error case, but a .q file will be more likely to catch the caes.

      Show
      Ning Zhang added a comment - Yongqiang, I noticed you have added a unit test in TestRCFile.java. Is it possible to add a unit test as .q file? The benefit of doing this is that TestRCFile just test one code path for particular functions. If we change the code path later in the real execution engine, we may not cover the error case, but a .q file will be more likely to catch the caes.
      Hide
      Ning Zhang added a comment -

      Another suggestion: in RCFileRecordReader.java, next(LongWritable, BytesRefArrayWritable) shares many common code with next(LongWritable). Can you refactor this function to something like:

      public boolean next(LongWritable key, ByteRefArrayWritable value)
      throws IOException {
      more = next(key);
      if ( more )

      { in.getCurrentRow(value); }

      return more;
      }

      This will always keep the logic consistent for the two next() functions.

      Show
      Ning Zhang added a comment - Another suggestion: in RCFileRecordReader.java, next(LongWritable, BytesRefArrayWritable) shares many common code with next(LongWritable). Can you refactor this function to something like: public boolean next(LongWritable key, ByteRefArrayWritable value) throws IOException { more = next(key); if ( more ) { in.getCurrentRow(value); } return more; } This will always keep the logic consistent for the two next() functions.
      Zheng Shao made changes -
      Affects Version/s 0.5.0 [ 12314156 ]
      Affects Version/s 0.6.0 [ 12314524 ]
      Priority Major [ 3 ] Blocker [ 1 ]
      Hide
      He Yongqiang added a comment -

      Attached 2 patches. hive-1088-branch0.5-2010-1-25.patch is for branch 0.5, and hive-1088-trunk-2010-1-25.patch is for trunk code.

      Show
      He Yongqiang added a comment - Attached 2 patches. hive-1088-branch0.5-2010-1-25.patch is for branch 0.5, and hive-1088-trunk-2010-1-25.patch is for trunk code.
      He Yongqiang made changes -
      Attachment hive-1088-branch0.5-2010-1-25.patch [ 12431384 ]
      Attachment hive-1088-trunk-2010-1-25.patch [ 12431385 ]
      He Yongqiang made changes -
      Attachment hive-1088-branch0.5-2010-1-25.2.patch [ 12431392 ]
      Attachment hive-1088-trunk-2010-1-25.2.patch [ 12431393 ]
      He Yongqiang made changes -
      Attachment hive-1088-branch0.5-2010-1-25.2.patch [ 12431392 ]
      He Yongqiang made changes -
      Attachment hive-1088-trunk-2010-1-25.2.patch [ 12431393 ]
      He Yongqiang made changes -
      Attachment hive-1088-branch0.5-2010-1-25.2.patch [ 12431395 ]
      Attachment hive-1088-trunk-2010-1-25.2.patch [ 12431396 ]
      Hide
      Ning Zhang added a comment -

      +1. Looks good. Will commit if tests pass.

      Show
      Ning Zhang added a comment - +1. Looks good. Will commit if tests pass.
      Hide
      Ning Zhang added a comment -

      Committed to 0.5.0 and trunk (0.6.0). Thanks Yongqiang!

      Show
      Ning Zhang added a comment - Committed to 0.5.0 and trunk (0.6.0). Thanks Yongqiang!
      Ning Zhang made changes -
      Status Open [ 1 ] Resolved [ 5 ]
      Resolution Fixed [ 1 ]
      Carl Steinbach made changes -
      Fix Version/s 0.6.0 [ 12314524 ]
      Affects Version/s 0.5.0 [ 12314156 ]
      Affects Version/s 0.6.0 [ 12314524 ]
      Component/s Serializers/Deserializers [ 12312585 ]
      Carl Steinbach made changes -
      Status Resolved [ 5 ] Closed [ 6 ]

        People

        • Assignee:
          He Yongqiang
          Reporter:
          He Yongqiang
        • Votes:
          0 Vote for this issue
          Watchers:
          0 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development