[GOBBLIN-153] No workflow to identify content type - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:

External issue URL:
https://github.com/linkedin/gobblin/issues/521

Description

The extractor assumes that the same formats are always received from a Source which is not always the case.
One can write a composite extractor that is able to tell in which format the message is in (JSON, Avro, Protobuf etc.) and invoke the real extractor on it. However it feels like there should be a native concept that allows us to do just that.

Github Url : https://github.com/linkedin/gobblin/issues/521
Github Reporter : thedrow
Github Created At : 2015-12-14T06:08:13Z
Github Updated At : 2017-01-12T04:31:09Z

Comments

stakiar wrote on 2016-02-12T23:56:02Z : Hey @thedrow thats an interesting idea. Do you have a specific use case in mind? For us we see that most of the time the data from a `Source` is usually all in one format.

Github Url : https://github.com/linkedin/gobblin/issues/521#issuecomment-183532968

thedrow wrote on 2016-02-13T06:42:10Z : Our ETL process takes configuration files from a lot of sources and parses them with augeas.
In order to do so we need to identify the type of the configuration file.
We can invoke a ruby script using https://github.com/github/linguist to do so or provide the means to do so in a Java library.
If the best solution is to create a source per configuration file type that's also fine but I still think that this feature would be useful for identifying how to parse the message.

Github Url : https://github.com/linkedin/gobblin/issues/521#issuecomment-183611666

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Abhishek Tiwari

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 19/Jul/17 07:40

Updated:: 19/Jul/17 07:40