Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-8756

numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark
    • None

    Description

      Run the following hive queries

      set datanucleus.cache.collections=false;
      set hive.stats.autogather=true;
      set hive.merge.mapfiles=false;
      set hive.merge.mapredfiles=false;
      set hive.map.aggr=true;
      
      create table tmptable(key string, value string);
      INSERT OVERWRITE TABLE tmptable
      SELECT unionsrc.key, unionsrc.value 
      FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
            UNION  ALL  
            SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
      DESCRIBE FORMATTED tmptable;
      

      The hive on spark prints the following table parameters:

      COLUMN_STATS_ACCURATE	true                
      	numFiles            	2                   
      	numRows             	0                   
      	rawDataSize         	0                   
      	totalSize           	225
      

      The hive on mr prints the following table parameters:

      able Parameters:	 	 
      	COLUMN_STATS_ACCURATE	true                
      	numFiles            	2                   
      	numRows             	26                  
      	rawDataSize         	199                 
      	totalSize           	225 
      

      As above we can see the numRows and rawDataSize are not collected by hive on spark stats

      Attachments

        1. HIVE-8756.1-spark.patch
          14 kB
          Na Yang
        2. HIVE-8756.2-spark.patch
          66 kB
          Na Yang

        Issue Links

          Activity

            People

              nyang Na Yang
              nyang Na Yang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: