XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • fs/s3
    • None

    Description

      Some perf improvements which spring to mind having looked at the s3guard import command

      Key points: it can handle the import of a tree with existing data better

      1. if the bucket is already under s3guard, then the listing will return all listed files, which will then be put() again.
      2. import calls putParentsIfNotPresent(), but DDBMetaStore.put() will do the parent creation anyway
      3. For each entry in the store (i.e. a file), the full parent listing is created, then a batch write created to put all the parents and the actual file

      As a result, it's at risk of doing many more put calls than needed, especially for wide/deep directory trees.

      It would be much more efficient to put all files in a single directory as part of 1+ batch request, with 1 parent tree. Better yet: a get() of that parent could skip the put of parent entries.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: