Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-957

Support automatic recursion removal from schemas

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.15.0
    • None

    Description

      Analytics engines like Hive etc cannot handle recursive schemas: schemas where inner fields can refer to the wrapping type. 

      This Jira proposes that we provide support for automatic recursion removal in data during data ingestion. 

      The simple proposal is to just drop the fields in the schema that introduce the recursion. 

      e.g.  (pseudo-schema)

      User

      {   string name;  User friend; }

      gets converted to :

      User

      {     string name; }

       

      A more sophisticated solution would be to do one or two levels of "schema-unrolling" before dropping data. 

      e.g. 

      output schema with one-level unrolling would look like: 

      User

      {     string name;    User1 friend; }

      User 1

      {      string name;   }

       

       

       

       

       

       

      Attachments

        Issue Links

          Activity

            People

              sdasapache Shirshanka Das
              sdasapache Shirshanka Das
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m