Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17469

IOStatistics Phase II

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.1
    • Fix Version/s: None
    • Component/s: fs, fs/azure, fs/s3
    • Labels:
      None

      Description

      Continue IOStatistics development with goals of

      • Easy adoption in applications
      • better instrumentation in hadoop codebase (distcp?)
      • more stats in abfs and s3a connectors

      A key has to be a thread level context for statistics so that app code doesn't have to explicitly ask for the stats for each worker thread. Instead

      filesystem components update the context stats as well as thread stats (when?) and then apps can pick up.

      • need to manage performance by minimising inefficient lookups, lock acquisition etc on what should be memory-only ops (read()), (write()),
      • and for duration tracking, cut down on calls to System.currentTime() so that only 1 should be made per operation,
      • need to propagate the context into worker threads

      Target uses

      I have a WiP Parquet branch too, to see what can be done there. This shows up how the thread context is needed as its unworkable to build up your own stats shapshot. Even if you collect it for listX and stream reads, it doesn't include FS operations (e.g. rename()) and you need to rework all your methods to pass the stats collector around

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 10m
                  4h 10m