Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28611

Histogram's height is diffrent

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.4.3
    • None
    • SQL
    • None

    Description

      CREATE TABLE desc_col_table (key int COMMENT 'column_comment') USING PARQUET;
      -- Test output for histogram statistics
      SET spark.sql.statistics.histogram.enabled=true;
      SET spark.sql.statistics.histogram.numBins=2;
      
      INSERT INTO desc_col_table values 1, 2, 3, 4;
      
      ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key;
      
      DESC EXTENDED desc_col_table key;
      
      spark-sql> DESC EXTENDED desc_col_table key;
      col_name	key
      data_type	int
      comment	column_comment
      min	1
      max	4
      num_nulls	0
      distinct_count	4
      avg_col_len	4
      max_col_len	4
      histogram	height: 4.0, num_of_bins: 2
      bin_0	lower_bound: 1.0, upper_bound: 2.0, distinct_count: 2
      bin_1	lower_bound: 2.0, upper_bound: 4.0, distinct_count: 2
      

      But our result is:
      https://github.com/apache/spark/blob/v2.4.3/sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out#L231-L242

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yumwang Yuming Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: