Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-8756

numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark
    • None

    Description

      Run the following hive queries

      set datanucleus.cache.collections=false;
      set hive.stats.autogather=true;
      set hive.merge.mapfiles=false;
      set hive.merge.mapredfiles=false;
      set hive.map.aggr=true;
      
      create table tmptable(key string, value string);
      INSERT OVERWRITE TABLE tmptable
      SELECT unionsrc.key, unionsrc.value 
      FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
            UNION  ALL  
            SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
      DESCRIBE FORMATTED tmptable;
      

      The hive on spark prints the following table parameters:

      COLUMN_STATS_ACCURATE	true                
      	numFiles            	2                   
      	numRows             	0                   
      	rawDataSize         	0                   
      	totalSize           	225
      

      The hive on mr prints the following table parameters:

      able Parameters:	 	 
      	COLUMN_STATS_ACCURATE	true                
      	numFiles            	2                   
      	numRows             	26                  
      	rawDataSize         	199                 
      	totalSize           	225 
      

      As above we can see the numRows and rawDataSize are not collected by hive on spark stats

      Attachments

        1. HIVE-8756.2-spark.patch
          66 kB
          Na Yang
        2. HIVE-8756.1-spark.patch
          14 kB
          Na Yang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nyang Na Yang Assign to me
            nyang Na Yang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment