Uploaded image for project: 'Apache Apex Malhar'
  1. Apache Apex Malhar
  2. APEXMALHAR-2009

concrete operator for writing to HDFS file

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.4.0
    • None
    • None

    Description

      Currently, for writing to HDFS file we have AbstractFileOutputOperator in the malhar library.

      It has following abstract methods :
      1. protected abstract String getFileName(INPUT tuple)
      2. protected abstract byte[] getBytesForTuple(INPUT tuple)

      These methods are kept generic to give flexibility to the app developers. But, someone who is new to apex; would look for ready-made implementation instead of extending Abstract implementation.

      Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. Aim of this operator would be to serve the purpose of ready to use operator for most frequent use-cases.

      Here are my key observations on most frequent use-cases:
      ------------------------------------------------------------------------------

      1. Writing tuples of type byte[] or String.
      2. All tuples on a particular stream land up in the same output file.
      3. App developer may want to add some custom tuple separator (e.g. newline character) between tuples.

      Discussion thread on mailing list here:
      http://mail-archives.apache.org/mod_mbox/apex-dev/201603.mbox/%3CCAHekGF_6KovS4cjYXzCLdU9En0iPaKO%2BBv%3DEJXbrCuhe9%2BtdrA%40mail.gmail.com%3E

      Attachments

        Activity

          People

            devendra Yogi Devendra
            devendra Yogi Devendra
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: