Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14159

Add some Java-8 friendly way to work with RemoteIterable, especially listings

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0-alpha2
    • Fix Version/s: None
    • Component/s: fs
    • Labels:
      None

      Description

      There's a fair amount of Hadoop code which uses FileSystem.listStatus(path) }} just to get an {{FileStatus[] array which they can then iterate over in a for loop.

      This is inefficient and scales badly, as the entire listing is done before the compute; it cannot handle directories with millions of entries.

      The listLocatedStatus() calls return a RemoteIterator class, which can't be used in for loops as it has the right to throw an IOE in any hasNext/next call. That doesn't matter, as we now have closures and simple stream operations.

       listLocatedStatus(path).filter((st) -> st.length > 0).apply(st -> fs.delete(st.path))}}
      

      See? We could do shiny new closure things. It wouldn't necessarily need changes to FileSystem either, just something which took RemoteIterator and let you chain some closures off it, similar to the java 8 streams operations.

      Once implemented, we can move to using it in the Hadoop code wherever we use listFiles() today

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                stevel@apache.org Steve Loughran
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: