Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2158

hdfsListDirectory in libhdfs does not scale

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.15.0
    • 0.15.2
    • None
    • None

    Description

      hdfsListDirectory makes one rpc call using deprecated fs.FileSystem.listPaths, and then two rpc calls for every entry in the returned array. When running a job with more than 3000 mappers each running a pipes application using libhdfs to scan a dfs directory with about 100-200 entries, this results in about 1M rpc calls to the namenode server overwhelming it.

      hdfsListDirectory should call fs.FileSystem.listStatus instead.

      I will submit a patch.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ckunz Christian Kunz Assign to me
            ckunz Christian Kunz
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment