Uploaded image for project: 'Apache Trafodion (Retired)'
  1. Apache Trafodion (Retired)
  2. TRAFODION-2465

UPDATE STATS failure on Hive table with long varchars

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1-incubating
    • 2.1-incubating
    • sql-cmp
    • None
    • Likely all

    Description

      The scenario is the following:

      1. Create a Hive text format table that has string columns with values 1048576 bytes long.
      2. Create a Trafodion external table with varchar(1048576) columns for these string columns.
      3. Do an UPDATE STATISTICS PERSISTENT on the table. This succeeds.
      4. Do an UPDATE STATISTICS REMOVE SAMPLE on the table. This succeeds.
      5. Repeat the UPDATE STATISTICS PERSISTENT on the table. This fails with the following errors:

          • ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an error (8609) from statement Process_Query.
          • ERROR[8609] Waited rollback performed without starting a transaction.
          • ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an error (8609) from statement Process_Query.
          • ERROR[8609] Waited rollback performed without starting a transaction.
          • ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an error (8609) from statement Process_Query.
          • ERROR[8609] Waited rollback performed without starting a transaction.
          • ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an error (8609) from statement Process_Query.
          • ERROR[8609] Waited rollback performed without starting a transaction.
          • ERROR[9214] Object TRAFODION."HIVESTATS".TRAF_SAMPLE_93_1485453261_202793 could not be created.
          • ERROR[8609] Waited rollback performed without starting a transaction.

      To reproduce the problem, create 3 files datagen.py, hive.sql, mytest.sql with the following contents and run them in this order:

      chmod 755 datagen.py
      ./datagen.py data.10rows_1MB.txt 10 2 1048576
      hive -f hive.sql
      sqlci -i mytest.sql

      $ cat datagen.py
      #! /usr/bin/env python
      import sys

      if len(sys.argv) != 5 or \
      sys.argv[1].lower() == '-h' or \
      sys.argv[1].lower() == '-help':
      print 'Usage: ' + sys.argv[0] + ' <file> <num of rows> <num of varchar columns> <varchar column length>'
      sys.exit()

      f = open(sys.argv[1], "w+")

      marker=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
      for num_rows in range(0, int(sys.argv[2])):
      f.write(str(num_rows) + '|')
      for num_cols in range(0, int(sys.argv[3])):
      f.write(marker[num_rows%len(marker)])
      for i in range (1, int(sys.argv[4])):
      f.write(str(i % 10))
      f.write('|')
      f.write(str(num_rows))
      f.write('\n')

      f.close()

      $ cat hive.sql
      drop database if exists mysch cascade;
      create database mysch;
      use mysch;

      create table base_lgvarchar_table(c_int int, c_string1 string, c_string2 string , p_int int) row format delimited fields terminated by '|';
      load data local inpath './data.10rows_10MB.txt' overwrite into table base_lgvarchar_table;
      create table mytable (c_int int, c_string1 string, c_string2 string ,p_int int) row format delimited fields terminated by '|';
      from base_lgvarchar_table insert overwrite table mytable select *;

      $ cat mytest.sql
      log mytest.log clear;
      control query default HIVE_MAX_STRING_LENGTH '1048576';
      set schema hive.mysch;

      drop external table if exists mytable cascade;
      create external table mytable (c_int int, c_string1 varchar(1048576), c_string2 varchar(1048576), p_int int) for hive.mysch.mytable;
      showddl mytable;

      update statistics for table mytable on every column sample random 90 percent persistent;
      update statistics for table mytable remove sample;
      update statistics for table mytable on existing column sample random 90 percent persistent;
      update statistics for table mytable remove sample;

      Attachments

        Issue Links

          Activity

            People

              dbirdsall Dave Birdsall
              dbirdsall Dave Birdsall
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: