Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27163

Column stats are not getting published after an insert query into an external table with custom location

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0
    • Hive

    Description

      Test case details are below

      test.q

      set hive.stats.column.autogather=true;
      set hive.stats.autogather=true;
      dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
      create external table test_custom(age int, name string) stored as orc location '/tmp/test';
      insert into test_custom select 1, 'test';
      desc formatted test_custom age;

      test.q.out

       

       

      #### A masked pattern was here ####
      PREHOOK: type: CREATETABLE
      #### A masked pattern was here ####
      PREHOOK: Output: database:default
      PREHOOK: Output: default@test_custom
      #### A masked pattern was here ####
      POSTHOOK: type: CREATETABLE
      #### A masked pattern was here ####
      POSTHOOK: Output: database:default
      POSTHOOK: Output: default@test_custom
      PREHOOK: query: insert into test_custom select 1, 'test'
      PREHOOK: type: QUERY
      PREHOOK: Input: _dummy_database@_dummy_table
      PREHOOK: Output: default@test_custom
      POSTHOOK: query: insert into test_custom select 1, 'test'
      POSTHOOK: type: QUERY
      POSTHOOK: Input: _dummy_database@_dummy_table
      POSTHOOK: Output: default@test_custom
      POSTHOOK: Lineage: test_custom.age SIMPLE []
      POSTHOOK: Lineage: test_custom.name SIMPLE []
      PREHOOK: query: desc formatted test_custom age
      PREHOOK: type: DESCTABLE
      PREHOOK: Input: default@test_custom
      POSTHOOK: query: desc formatted test_custom age
      POSTHOOK: type: DESCTABLE
      POSTHOOK: Input: default@test_custom
      col_name                age
      data_type               int
      min
      max
      num_nulls
      distinct_count
      avg_col_len
      max_col_len
      num_trues
      num_falses
      bit_vector
      comment                 from deserializer

      As we can see from desc formatted output, column stats were not populated

       

      Attachments

        Activity

          People

            dengzh Zhihua Deng
            tarak271 Taraka Rama Rao Lethavadla
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 6h 10m
                6h 10m