Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-3450

Select query with average function for substring of binary column throws incorrect exception/error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.6.0
    • None
    • data-query
    • None
    • Spark 2.1

    Description

      Steps :

      From Spark beeline user creates a table with binary type and loads data to table.

       CREATE TABLE uniqdata (CUST_ID int,CUST_NAME binary,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='2000');
      LOAD DATA inpath 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' ,'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

      Select query with average function for substring of binary column is executed.

      select max(substr(CUST_NAME,1,2)),min(substr(CUST_NAME,1,2)),avg(substr(CUST_NAME,1,2)),count(substr(CUST_NAME,1,2)),sum(substr(CUST_NAME,1,2)),variance(substr(CUST_NAME,1,2)) from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.1234000058 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10;

      select max(substring(CUST_NAME,1,2)),min(substring(CUST_NAME,1,2)),avg(substring(CUST_NAME,1,2)),count(substring(CUST_NAME,1,2)),sum(substring(CUST_NAME,1,2)),variance(substring(CUST_NAME,1,2)) from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.1234000058 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10;

       

      【Actual Output】:Select query with average function for substring of binary column throws incorrect exception/error

       0: jdbc:hive2://10.18.98.120:22550/default> select max(substr(CUST_NAME,1,2)),min(substr(CUST_NAME,1,2)),avg(substr(CUST_NAME,1,2)),count(substr(CUST_NAME,1,2)),sum(substr(CUST_NAME,1,2)),variance(substr(CUST_NAME,1,2)) from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.1234000058 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10;
      Error: org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to name on unresolved object, tree: unresolvedalias(avg(substring(CUST_NAME#45, 1, 2)), None) (state=,code=0)

       

      【Expected Output】:Select query with average function for substring of binary column should throw correct error message indicating the type binary cant be supported. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            chetdb Chetan Bhat
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: