Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15547

WASB: improve listStatus performance

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.9.1, 3.0.2
    • 2.10.0, 3.1.1
    • fs/azure
    • None
    • WASB: listStatus 10x performance improvement for listing 700,000 files

    Description

      The WASB implementation of Filesystem.listStatus is very slow due to O(n!) algorithm to remove duplicates and uses too much memory due to the extra conversion from BlobListItem to FileMetadata to FileStatus.  It takes over 30 minutes to list 700,000 files.  

      Attachments

        1. HADOOP-15547.001.patch
          57 kB
          Thomas Marqardt
        2. HADOOP-15547.002.patch
          63 kB
          Thomas Marqardt
        3. HADOOP-15547.003.patch
          63 kB
          Thomas Marqardt
        4. HADOOP-15547-004.patch
          72 kB
          Steve Loughran
        5. HADOOP-15547-004.patch
          72 kB
          Steve Loughran
        6. HADOOP-15547-branch-2-001.patch
          74 kB
          Thomas Marqardt

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tmarquardt Thomas Marqardt Assign to me
            tmarquardt Thomas Marqardt
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment