Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2185

extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)

    XMLWordPrintableJSON

Details

    • Reviewed
    • This patch added getSerDeStats() methods to the Serializer and Deserializer interfaces. Consequently, any SerDes which were compiled against the old interfaces will need to be recompiled against the new interfaces in order to work against Hive 0.8.0.

    Description

      Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we collect statistics about the number of rows per partition/table. Other statistics (e.g., total table/partition size) are derived from the file system.

      Here, we want to collect information about the sizes of uncompressed data, to be able to determine the efficiency of compression.
      Currently, a large part of statistics collection mechanism is hardcoded and not-easily extensible for other statistics.
      On top of adding the new statistic collected, it would be desirable to extend the collection mechanism, so any new statistics could be added easily.

      Attachments

        1. HIVE-2185.2.patch
          870 kB
          Tomasz Nykiel
        2. HIVE-2185.1.patch
          854 kB
          Tomasz Nykiel
        3. HIVE-2185.patch
          825 kB
          Tomasz Nykiel

        Activity

          People

            tnykiel Tomasz Nykiel
            tnykiel Tomasz Nykiel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: