Hive
  1. Hive
  2. HIVE-2260

ExecDriver::addInputPaths should pass the table properties to the record writer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently when ExecDriver encounters a non-existent partition, it creates an empty file so that the query will be valid (and return 0 results). However, when it does this and calls getHiveRecordWriter(), it creates a new instance of Properties, rather than providing the Properties associated with the table.

      This causes RecordWriters that pull information from the table through the Properties to fail (such as Haivvreo). The RecordWriter should be provided the table's Properties, as it is in all other cases where it's called.

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #830 (See https://builds.apache.org/job/Hive-trunk-h0.21/830/)
        HIVE-2260. ExecDriver::addInputPaths should pass the table properties to the record writer (Jakob Homan via Ning Zhang)

        nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1147436
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #830 (See https://builds.apache.org/job/Hive-trunk-h0.21/830/ ) HIVE-2260 . ExecDriver::addInputPaths should pass the table properties to the record writer (Jakob Homan via Ning Zhang) nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1147436 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
        Hide
        Ning Zhang added a comment -

        Committed. Thanks Jakob!

        Show
        Ning Zhang added a comment - Committed. Thanks Jakob!
        Hide
        Ning Zhang added a comment -

        +1. Looks good to me. Will commit if tests pass.

        Show
        Ning Zhang added a comment - +1. Looks good to me. Will commit if tests pass.
        Hide
        Jakob Homan added a comment -

        Is there anything else that needs to be done to get a code review or commit on this patch?

        Show
        Jakob Homan added a comment - Is there anything else that needs to be done to get a code review or commit on this patch?
        Hide
        Jakob Homan added a comment -

        resonding to RB comment.

        Is it possible to add a testcase? If not, can you update the jira how it is tested?

        Adding a testcase isn't possible without something like Mockito, since this is buried deep within the guts of a private method that calls many methods and creates new instances of classes from calls to getClass(). Even then it would take about a dozen mocks and be amazingly fragile. This code is certainly ripe for refactoring to ease testing.

        This was manually tested (and deployed to our cluster). The manual test is to prune on a non-existent partition while using an input format like Haivvreo's, which makes use of the Properties instance. Without the patch, the call fails. With the patch, it succeeds with the behavior as described in ExecDriver.java:

        ExecDriver
        // If the query references non-existent partitions
        // We need to add a empty file, it is not acceptable to change the
        // operator tree
        // Consider the query:
        // select * from (select count(1) from T union all select count(1) from
        // T2) x;
        // If T is empty and T2 contains 100 rows, the user expects: 0, 100 (2
        // rows)
        Show
        Jakob Homan added a comment - resonding to RB comment. Is it possible to add a testcase? If not, can you update the jira how it is tested? Adding a testcase isn't possible without something like Mockito, since this is buried deep within the guts of a private method that calls many methods and creates new instances of classes from calls to getClass(). Even then it would take about a dozen mocks and be amazingly fragile. This code is certainly ripe for refactoring to ease testing. This was manually tested (and deployed to our cluster). The manual test is to prune on a non-existent partition while using an input format like Haivvreo's, which makes use of the Properties instance. Without the patch, the call fails. With the patch, it succeeds with the behavior as described in ExecDriver.java : ExecDriver // If the query references non-existent partitions // We need to add a empty file, it is not acceptable to change the // operator tree // Consider the query: // select * from (select count(1) from T union all select count(1) from // T2) x; // If T is empty and T2 contains 100 rows, the user expects: 0, 100 (2 // rows)
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1014/#review1036
        -----------------------------------------------------------

        Changes look good.
        Is it possible to add a testcase? If not, can you update the jira how it is tested?

        • Amareshwari

        On 2011-07-06 00:45:12, Jakob Homan wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1014/

        -----------------------------------------------------------

        (Updated 2011-07-06 00:45:12)

        Review request for hive.

        Summary

        -------

        Currently when ExecDriver encounters a non-existent partition, it creates an empty file so that the query will be valid (and return 0 results). However, when it does this and calls getHiveRecordWriter(), it creates a new instance of Properties, rather than providing the Properties associated with the table.

        This causes RecordWriters that pull information from the table through the Properties to fail (such as Haivvreo). The RecordWriter should be provided the table's Properties, as it is in all other cases where it's called.

        This addresses bug HIVE-2260.

        https://issues.apache.org/jira/browse/HIVE-2260

        Diffs

        -----

        ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 4fba845

        Diff: https://reviews.apache.org/r/1014/diff

        Testing

        -------

        Thanks,

        Jakob

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1014/#review1036 ----------------------------------------------------------- Changes look good. Is it possible to add a testcase? If not, can you update the jira how it is tested? Amareshwari On 2011-07-06 00:45:12, Jakob Homan wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1014/ ----------------------------------------------------------- (Updated 2011-07-06 00:45:12) Review request for hive. Summary ------- Currently when ExecDriver encounters a non-existent partition, it creates an empty file so that the query will be valid (and return 0 results). However, when it does this and calls getHiveRecordWriter(), it creates a new instance of Properties, rather than providing the Properties associated with the table. This causes RecordWriters that pull information from the table through the Properties to fail (such as Haivvreo). The RecordWriter should be provided the table's Properties, as it is in all other cases where it's called. This addresses bug HIVE-2260 . https://issues.apache.org/jira/browse/HIVE-2260 Diffs ----- ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 4fba845 Diff: https://reviews.apache.org/r/1014/diff Testing ------- Thanks, Jakob
        Hide
        Jakob Homan added a comment -

        Tests pass except filter_join_breaktask, which failed once during the full test run, but I've not been able to re-produce it again. This is ready for review.

        Show
        Jakob Homan added a comment - Tests pass except filter_join_breaktask, which failed once during the full test run, but I've not been able to re-produce it again. This is ready for review.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1014/
        -----------------------------------------------------------

        Review request for hive.

        Summary
        -------

        Currently when ExecDriver encounters a non-existent partition, it creates an empty file so that the query will be valid (and return 0 results). However, when it does this and calls getHiveRecordWriter(), it creates a new instance of Properties, rather than providing the Properties associated with the table.

        This causes RecordWriters that pull information from the table through the Properties to fail (such as Haivvreo). The RecordWriter should be provided the table's Properties, as it is in all other cases where it's called.

        This addresses bug HIVE-2260.
        https://issues.apache.org/jira/browse/HIVE-2260

        Diffs


        ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 4fba845

        Diff: https://reviews.apache.org/r/1014/diff

        Testing
        -------

        Thanks,

        Jakob

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1014/ ----------------------------------------------------------- Review request for hive. Summary ------- Currently when ExecDriver encounters a non-existent partition, it creates an empty file so that the query will be valid (and return 0 results). However, when it does this and calls getHiveRecordWriter(), it creates a new instance of Properties, rather than providing the Properties associated with the table. This causes RecordWriters that pull information from the table through the Properties to fail (such as Haivvreo). The RecordWriter should be provided the table's Properties, as it is in all other cases where it's called. This addresses bug HIVE-2260 . https://issues.apache.org/jira/browse/HIVE-2260 Diffs ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 4fba845 Diff: https://reviews.apache.org/r/1014/diff Testing ------- Thanks, Jakob
        Hide
        Jakob Homan added a comment -

        Patch takes the Properties from the table/partition and passes it to the RecordWriter. This fixes the issue Haivvreo runs into.

        The patch doesn't include a unit test, since this is pretty far into the guts of ExecDriver. I'd be happy to write one up and introduce Mockito, which I think is the only reasonable way of getting to this code path, if others think it's worthwhile.

        Show
        Jakob Homan added a comment - Patch takes the Properties from the table/partition and passes it to the RecordWriter. This fixes the issue Haivvreo runs into. The patch doesn't include a unit test, since this is pretty far into the guts of ExecDriver. I'd be happy to write one up and introduce Mockito, which I think is the only reasonable way of getting to this code path, if others think it's worthwhile.

          People

          • Assignee:
            Jakob Homan
            Reporter:
            Jakob Homan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development