Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.9.1, 3.0.2
    • Fix Version/s: 2.10.0, 3.1.1
    • Component/s: fs/azure
    • Labels:
      None
    • Release Note:
      WASB: listStatus 10x performance improvement for listing 700,000 files

      Description

      The WASB implementation of Filesystem.listStatus is very slow due to O(n!) algorithm to remove duplicates and uses too much memory due to the extra conversion from BlobListItem to FileMetadata to FileStatus.  It takes over 30 minutes to list 700,000 files.  

        Attachments

        1. HADOOP-15547-branch-2-001.patch
          74 kB
          Thomas Marquardt
        2. HADOOP-15547-004.patch
          72 kB
          Steve Loughran
        3. HADOOP-15547-004.patch
          72 kB
          Steve Loughran
        4. HADOOP-15547.003.patch
          63 kB
          Thomas Marquardt
        5. HADOOP-15547.002.patch
          63 kB
          Thomas Marquardt
        6. HADOOP-15547.001.patch
          57 kB
          Thomas Marquardt

          Activity

            People

            • Assignee:
              tmarquardt Thomas Marquardt
              Reporter:
              tmarquardt Thomas Marquardt
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: