Hive
  1. Hive
  2. HIVE-2891

TextConverter for UDF's is inefficient if the input object is already Text or Lazy

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.7.0, 0.7.1, 0.8.1
    • Fix Version/s: 0.9.0
    • Labels:
      None

      Description

      The TextConverter in PrimitiveObjectInspectorConverter.java is very inefficient if the input object is already Text or Lazy. Since it calls getPrimitiveJavaObject, each Text is decoded into a String and then re-encoded into Text. The solution is to check if preferWritable() is true, then call getPrimitiveWritable(input).

      To test performance, I ran the Grep query from https://issues.apache.org/jira/browse/HIVE-396 on a cluster of 3 ec2 large nodes (2 slaves 1 master) on 6GB of data. It took 21 map tasks. With the current 0.8.1 version, it took 81 seconds. After patching, it took 66 seconds.

      I will attach a patch and testcases.

      1. HIVE-2891.2.patch.txt
        2 kB
        Cliff Engle
      2. HIVE-2891.1.patch.txt
        2 kB
        Cliff Engle

        Activity

        Hide
        Cliff Engle added a comment -

        Improve TextConverter performance

        Show
        Cliff Engle added a comment - Improve TextConverter performance
        Hide
        Ashutosh Chauhan added a comment -

        Many tests fail with this patch. Simplest way to reproduce ant test -Dtestcase=TestContribCliDriver

        Show
        Ashutosh Chauhan added a comment - Many tests fail with this patch. Simplest way to reproduce ant test -Dtestcase=TestContribCliDriver
        Hide
        Cliff Engle added a comment -

        Fixed bug in previous patch. I didn't realize Text.getBytes() could return more than the Text's length.

        I ran the TestContribCliDriver tests and they all pass now.

        Show
        Cliff Engle added a comment - Fixed bug in previous patch. I didn't realize Text.getBytes() could return more than the Text's length. I ran the TestContribCliDriver tests and they all pass now.
        Hide
        Ashutosh Chauhan added a comment -

        Cool. I also learnt that hard way. Looks good. Running tests now.

        Show
        Ashutosh Chauhan added a comment - Cool. I also learnt that hard way. Looks good. Running tests now.
        Hide
        Ashutosh Chauhan added a comment -

        Committed to trunk. Thanks, Cliff!

        Show
        Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Cliff!
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1336 (See https://builds.apache.org/job/Hive-trunk-h0.21/1336/)
        HIVE-2891: TextConverter for UDF's is inefficient if the input object is already Text or Lazy (Cliff Engle via Ashutosh Chauhan) (Revision 1306096)

        Result = FAILURE
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1306096
        Files :

        • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
        • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorConverters.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1336 (See https://builds.apache.org/job/Hive-trunk-h0.21/1336/ ) HIVE-2891 : TextConverter for UDF's is inefficient if the input object is already Text or Lazy (Cliff Engle via Ashutosh Chauhan) (Revision 1306096) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1306096 Files : /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorConverters.java
        Hide
        Ashutosh Chauhan added a comment -

        This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.

        Show
        Ashutosh Chauhan added a comment - This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-2891: TextConverter for UDF's is inefficient if the input object is already Text or Lazy (Cliff Engle via Ashutosh Chauhan) (Revision 1306096)

        Result = ABORTED
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1306096
        Files :

        • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
        • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorConverters.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2891 : TextConverter for UDF's is inefficient if the input object is already Text or Lazy (Cliff Engle via Ashutosh Chauhan) (Revision 1306096) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1306096 Files : /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorConverters.java

          People

          • Assignee:
            Cliff Engle
            Reporter:
            Cliff Engle
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development