Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14764 Über-jira adl:// Azure Data Lake Phase II: Performance, Resilience and Testing
  3. HADOOP-12876

[Azure Data Lake] Support for process level FileStatus cache to optimize GetFileStatus frequent operations

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: fs, fs/adl, tools
    • Labels:
      None
    • Target Version/s:

      Description

      Add support to cache GetFileStatus and ListStatus response locally for limited period of time. Local cache for limited period of time would optimize number of calls for GetFileStatus operation.
      One of the example where local limited period cache would be useful - terasort ListStatus on input directory follows with GetFileStatus operation on each file within directory. For 2048 input files in a directory would save 2048 GetFileStatus calls during start up (Using the ListStatus response to cache FileStatus instances).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vishwajeet.dusane Vishwajeet Dusane
                Reporter:
                vishwajeet.dusane Vishwajeet Dusane
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated: