Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9981

globStatus should minimize its listStatus and getFileStatus calls

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • None
    • None

    Description

      After HADOOP-9652, listStatus() or globStatus() calls against a local file system directory is very slow. A user was loading data from local file system to Hive and it took about 30 seconds. The same operation took less than a second pre-HADOOP-9652.

      The input path had many other files beside the input files and strace showed that fork & exec of stat against each and every one of them. jstack confirmed that this was being done from getNativeFileLinkStatus().

      Attachments

        1. HADOOP-9981.001.patch
          16 kB
          Colin McCabe
        2. HADOOP-9981.002.patch
          16 kB
          Colin McCabe
        3. HADOOP-9981.003.patch
          10 kB
          Colin McCabe

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cmccabe Colin McCabe Assign to me
            kihwal Kihwal Lee
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment