Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-1592

Unify HdfsIO and HadoopInputFormatIO

Details

    Description

      HIFIO is currently in PR (https://github.com/apache/beam/pull/1994) and as per discussion in https://lists.apache.org/thread.html/803857877804165e798cf31edf079e6603eb9682b7690d52124c31e7@%3Cdev.beam.apache.org%3E, we'd like to check HIFIO in as-is, then unify the two since they share a lot of code.

      dhalperi@google.com has mentioned: "the FileInputFormat reader gets to call some special APIs that the
      generic InputFormat reader cannot – so they are not completely redundant. Specifically, FileInputFormat reader can do size-based splitting."

      Dan recommended: "See if we can "inline" the FileInputFormat specific parts of HdfsIO inside of HadoopInputFormatIO via reflection. If so, we can get the best of both worlds with shared code."

      This seems reasonable to me.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sisk Stephen Sisk
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: