Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-153

No workflow to identify content type

    XMLWordPrintableJSON

Details

    Description

      The extractor assumes that the same formats are always received from a Source which is not always the case.
      One can write a composite extractor that is able to tell in which format the message is in (JSON, Avro, Protobuf etc.) and invoke the real extractor on it. However it feels like there should be a native concept that allows us to do just that.

      Github Url : https://github.com/linkedin/gobblin/issues/521
      Github Reporter : thedrow
      Github Created At : 2015-12-14T06:08:13Z
      Github Updated At : 2017-01-12T04:31:09Z

      Comments


      stakiar wrote on 2016-02-12T23:56:02Z : Hey @thedrow thats an interesting idea. Do you have a specific use case in mind? For us we see that most of the time the data from a `Source` is usually all in one format.

      Github Url : https://github.com/linkedin/gobblin/issues/521#issuecomment-183532968


      thedrow wrote on 2016-02-13T06:42:10Z : Our ETL process takes configuration files from a lot of sources and parses them with augeas.
      In order to do so we need to identify the type of the configuration file.
      We can invoke a ruby script using https://github.com/github/linguist to do so or provide the means to do so in a Java library.
      If the best solution is to create a source per configuration file type that's also fine but I still think that this feature would be useful for identifying how to parse the message.

      Github Url : https://github.com/linkedin/gobblin/issues/521#issuecomment-183611666

      Attachments

        Activity

          People

            Unassigned Unassigned
            abti Abhishek Tiwari
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: