Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
this is about:
- num files
- datasize (sum of filesizes)
- num erasure coded files
right now these are scanned during every BasicStatsTask execution - which means some filesystem reads/etc - for small inserts these are visible in case the fs is a bit slower (s3 and friends)
I don't think they are really in use...we rely more on columnstats which are more accurate ; and because of the datasize in this case is for "offline" (ondisk) - while we should be insted calculate with "online" sizes...
proposal:
- remove collection and storage of this data
- collect it on the fly during "desc formatted" statements to provide them for informational purposes
Attachments
Issue Links
- supercedes
-
HIVE-17478 Move filesystem stats collection from metastore to ql
- Patch Available