Description
Continue IOStatistics development with goals of
- Easy adoption in applications
- better instrumentation in hadoop codebase (distcp?)
- more stats in abfs and s3a connectors
A key has to be a thread level context for statistics so that app code doesn't have to explicitly ask for the stats for each worker thread. Instead
filesystem components update the context stats as well as thread stats (when?) and then apps can pick up.
- need to manage performance by minimising inefficient lookups, lock acquisition etc on what should be memory-only ops (read()), (write()),
- and for duration tracking, cut down on calls to System.currentTime() so that only 1 should be made per operation,
- need to propagate the context into worker threads
Target uses
- Impala
- Spark via
SPARK-29397 - S3A committers
- Iceberg.
I have a WiP Parquet branch too, to see what can be done there. This shows up how the thread context is needed as its unworkable to build up your own stats shapshot. Even if you collect it for listX and stream reads, it doesn't include FS operations (e.g. rename()) and you need to rework all your methods to pass the stats collector around
Attachments
Issue Links
- is blocked by
-
HADOOP-13551 Collect AwsSdkMetrics in S3A FileSystem IOStatistics
- Resolved
- supercedes
-
HADOOP-16830 Add Public IOStatistics API
- Resolved