Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2158

hdfsListDirectory in libhdfs does not scale

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 0.15.2
    • Component/s: None
    • Labels:
      None

      Description

      hdfsListDirectory makes one rpc call using deprecated fs.FileSystem.listPaths, and then two rpc calls for every entry in the returned array. When running a job with more than 3000 mappers each running a pipes application using libhdfs to scan a dfs directory with about 100-200 entries, this results in about 1M rpc calls to the namenode server overwhelming it.

      hdfsListDirectory should call fs.FileSystem.listStatus instead.

      I will submit a patch.

        Attachments

        1. 2158.patch
          5 kB
          Christian Kunz

          Activity

            People

            • Assignee:
              ckunz Christian Kunz
              Reporter:
              ckunz Christian Kunz
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: