XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      Some perf improvements which spring to mind having looked at the s3guard import command

      Key points: it can handle the import of a tree with existing data better

      1. if the bucket is already under s3guard, then the listing will return all listed files, which will then be put() again.
      2. import calls putParentsIfNotPresent(), but DDBMetaStore.put() will do the parent creation anyway
      3. For each entry in the store (i.e. a file), the full parent listing is created, then a batch write created to put all the parents and the actual file

      As a result, it's at risk of doing many more put calls than needed, especially for wide/deep directory trees.

      It would be much more efficient to put all files in a single directory as part of 1+ batch request, with 1 parent tree. Better yet: a get() of that parent could skip the put of parent entries.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: