Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-100

WorkUnit and WorkUnitState

    XMLWordPrintableJSON

Details

    Description

      1. According to your Java Doc of WorkUnit class
      2. @deprecated Properties in

      {@link SourceState} should not be added to a {@link WorkUnit}. Having each
      - {@link WorkUnit} contain a copy of {@link SourceState}

      is a waste of memory. Use

      {@link #create(Extract, WatermarkInterval)}

      .
      3. So, QueryBasedSource class is creating a WorkUnit this way.
      WorkUnit.create(extract)
      There is no information about SourceState.
      4. But, many extractors(including JdbcExtractor, MysqlExtractor and QueryBasedExtractor) are getting properties this way.
      this.workUnit.getProp(source.conn.driver)
      5. It can return null values..
      So, in my opinion We should get properties from WorkUnitState rather than WorkUnit.

      Github Url : https://github.com/linkedin/gobblin/issues/1065
      Github Reporter : ggthename
      Github Created At : 2016-06-23T08:55:05Z
      Github Updated At : 2017-01-12T05:05:10Z

      Comments


      stakiar wrote on 2016-07-02T17:15:10Z : Hello @ggthename,

      I'm not sure if I understand your question properly. Is this an actual bug you are seeing?

      Some background:

      • A `WorkUnitState` is just a wrapper around a `WorkUnit`, it just contains a few additional runtime properties
      • A `WorkUnit` defines work that needs to be done in an individual Gobblin `Task`, usually a job will consist of many `WorkUnit`s where each `WorkUnit` consists of some division of work that needs to be done
      • `SourceState` is the global configuration for an entire job, so it is basically a set of configuration properties global to all `WorkUnit`s

      This wiki has some more documentation on how this all works: http://gobblin.readthedocs.io/en/latest/user-guide/State-Management-and-Watermarks/#gobblin-state-deep-dive

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-230112363


      chosh0615 wrote on 2016-07-03T09:23:21Z : As @ggthename mentioned, JdbcExtractor or QuerybasedExtractor classes are getting some properties from WorkUnit and get results getting null values.
      For instance, properties about jdbc connection information should be read from WorkUnitState.
      Since WorkUnitState.getProp function scans all properties in WorkUnitState itself, WorkUnit, and JobState, we can simply use WorkUnitState.getProp from the JdbcExtractor or QuerybasedExtractor.

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-230143957


      abti wrote on 2016-07-13T11:54:09Z : @ggthename Yes, this is a known issue that popped up with a few recent optimizations. It is being tracked here: https://github.com/linkedin/gobblin/issues/1022

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-232333237


      ggthename wrote on 2016-07-14T02:32:33Z : @abti - I think that it is a little different from #1022
      Because, this issue is related to some propreties being read from WorkUnit not available in it anymore,
      whereas the issue #1022 was about the properties being read from WorkUnitState not available in it.

      Properties that are not available in WorkUnit but in WorkUnitState can be read by calling WorkUnitState.getProp(String) method.
      This method tries to read property from WorkUnitState itself, and reads from WorkUnit and JobState if it does not find in WorkUnitState.

      ```
      public String getProp(String key) {
      String value = super.getProp(key);
      if (value == null)

      { value = this.workUnit.getProp(key); }

      if (value == null)

      { value = this.jobState.getProp(key); }

      return value;
      }
      ```

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-232540916


      jinhyukchang wrote on 2016-07-26T16:54:20Z : Was there any change how property is being populated into WorkUnit? With latest build, integration test on MySQLExtractor is having NPE due to this.

      https://github.com/linkedin/gobblin/blob/master/gobblin-core/src/main/java/gobblin/source/extractor/extract/jdbc/MysqlExtractor.java#L172

      java.lang.NullPointerException
      at gobblin.source.extractor.extract.jdbc.MysqlExtractor.getConnectionUrl(MysqlExtractor.java:172)
      at gobblin.source.extractor.extract.jdbc.JdbcExtractor.createJdbcSource(JdbcExtractor.java:716)
      at gobblin.source.extractor.extract.jdbc.JdbcExtractor.executePreparedSql(JdbcExtractor.java:675)
      at gobblin.source.extractor.extract.jdbc.JdbcExtractor.extractMetadata(JdbcExtractor.java:289)
      at gobblin.source.extractor.extract.QueryBasedExtractor.build(QueryBasedExtractor.java:244)
      at gobblin.source.extractor.extract.jdbc.MysqlSource.getExtractor(MysqlSource.java:40)
      at gobblin.runtime.TaskContext.getExtractor(TaskContext.java:119)
      at gobblin.runtime.Task.run(Task.java:127)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-235332200


      chavdar wrote on 2016-07-27T23:21:49Z : @tuGithub can you have a look? Your Salesforce changes might have addressed this issue.

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-235751255


      tuGithub wrote on 2016-07-28T17:47:10Z : looking...

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-235971049


      jinhyukchang wrote on 2016-08-15T22:13:01Z : Hi, Is there any update on this issue?

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-239945453


      chavdar wrote on 2016-08-16T00:10:54Z : @tuGithub have you had the time to look at this?

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-239966844


      jinhyukchang wrote on 2016-09-13T16:37:52Z : Hi, has this issue been resolved?

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-246743380


      lbendig wrote on 2016-09-13T20:32:08Z : @jinhyukchang The issue is still there and without patching the extractors, NPEs are thrown.

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-246814167


      chosh0615 wrote on 2016-09-14T04:32:44Z : There is a pull request about this issue. #1085
      @jinhyukchang , you can merge this and try for now.

      Github Url : https://github.com/linkedin/gobblin/issues/1065#issuecomment-246903908

      Attachments

        Activity

          People

            Unassigned Unassigned
            abti Abhishek Tiwari
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: