Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The extractor assumes that the same formats are always received from a Source which is not always the case.
One can write a composite extractor that is able to tell in which format the message is in (JSON, Avro, Protobuf etc.) and invoke the real extractor on it. However it feels like there should be a native concept that allows us to do just that.
Github Url : https://github.com/linkedin/gobblin/issues/521
Github Reporter : thedrow
Github Created At : 2015-12-14T06:08:13Z
Github Updated At : 2017-01-12T04:31:09Z
Comments
stakiar wrote on 2016-02-12T23:56:02Z : Hey @thedrow thats an interesting idea. Do you have a specific use case in mind? For us we see that most of the time the data from a `Source` is usually all in one format.
Github Url : https://github.com/linkedin/gobblin/issues/521#issuecomment-183532968
thedrow wrote on 2016-02-13T06:42:10Z : Our ETL process takes configuration files from a lot of sources and parses them with augeas.
In order to do so we need to identify the type of the configuration file.
We can invoke a ruby script using https://github.com/github/linguist to do so or provide the means to do so in a Java library.
If the best solution is to create a source per configuration file type that's also fine but I still think that this feature would be useful for identifying how to parse the message.
Github Url : https://github.com/linkedin/gobblin/issues/521#issuecomment-183611666