Hive
  1. Hive
  2. HIVE-7160

Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.14.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      simple UDF missing vectorization - simple example would be

      hive> explain select concat( l_orderkey, ' msecs') from lineitem;

      is not vectorized while

      hive> explain select concat(cast(l_orderkey as string), ' msecs') from lineitem;

      can be vectorized.

      14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf found for GenericUDFConcat, descriptor: Argument Count = 2, mode = PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = {COLUMN,COLUMN}
      14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize
      org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is not supported
              at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918)
      

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        I think design issue here is one raised in HIVE-7632 Vectorizer currently inserts casts and than evaluates it so that types of all operands match for UDF. It does so because currently Hive doesnt upcast operands while it does semantic checking and leave this to runtime where it is achieved, mainly via the logic in GenericUDFBaseNumeric Instead of delegating type casting to runtime, this should happen at compile time, when we are doing type checking and should upcast operands as necessary. Once we do this in TypeCheckProcFactory there will be no need to insert and evaluate cast later in compilation (like vectorizer) or runtime (GenericUDFOpNumeric)

        Show
        Ashutosh Chauhan added a comment - I think design issue here is one raised in HIVE-7632 Vectorizer currently inserts casts and than evaluates it so that types of all operands match for UDF. It does so because currently Hive doesnt upcast operands while it does semantic checking and leave this to runtime where it is achieved, mainly via the logic in GenericUDFBaseNumeric Instead of delegating type casting to runtime, this should happen at compile time, when we are doing type checking and should upcast operands as necessary. Once we do this in TypeCheckProcFactory there will be no need to insert and evaluate cast later in compilation (like vectorizer) or runtime (GenericUDFOpNumeric)
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12648447/HIVE-7160.1.patch.txt

        ERROR: -1 due to 10 failed/errored test(s), 5511 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
        org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
        org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testDropTable
        org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitionNames
        org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitions
        org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
        org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
        org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
        org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-394/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 10 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12648447

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648447/HIVE-7160.1.patch.txt ERROR: -1 due to 10 failed/errored test(s), 5511 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testDropTable org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitionNames org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitions org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-394/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed This message is automatically generated. ATTACHMENT ID: 12648447
        Hide
        Navis added a comment -

        There might some design issue. added preferredType for VectorizedExpressions annotation. Will try argument conversion if there are not proper VectorExpression.

        In this case, concat(column<int>/column<string>) is not supported but adding preferredType="String..." makes final try with first argument casted to string type.

        Show
        Navis added a comment - There might some design issue. added preferredType for VectorizedExpressions annotation. Will try argument conversion if there are not proper VectorExpression. In this case, concat(column<int>/column<string>) is not supported but adding preferredType="String..." makes final try with first argument casted to string type.

          People

          • Assignee:
            Navis
            Reporter:
            Gopal V
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development