Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-2054

`CommitActivityImpl` fails for job types (sources) other than Iceberg-Distcp

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • gobblin-core
    • None

    Description

      gobblin-on-temporal execution has been failing for other job types than iceberg-distcp (which uses `CopySource`). in particular Commit fails with:

      java.lang.IllegalArgumentException: Missing required property writer.output.dir
      	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
      	at org.apache.gobblin.util.WriterUtils.getWriterOutputDir(WriterUtils.java:121)
      	at org.apache.gobblin.publisher.BaseDataPublisher.publishData(BaseDataPublisher.java:390)
      	at org.apache.gobblin.publisher.BaseDataPublisher.publishMultiTaskData(BaseDataPublisher.java:379)
      	at org.apache.gobblin.publisher.BaseDataPublisher.publishData(BaseDataPublisher.java:366)
      	at org.apache.gobblin.publisher.DataPublisher.publish(DataPublisher.java:81)
      	at org.apache.gobblin.runtime.SafeDatasetCommit.commitDataset(SafeDatasetCommit.java:260)
      	at org.apache.gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:168)
      	at org.apache.gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:64)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      this is odd because that same prop had already been used prior to commit, while processing the `WorkUnit`! moreover logging shows it to be present within the `JobState`

      anyway, even when using a private build that hard-coded that property, this later error arises:

      Caused by: java.lang.IllegalArgumentException: Can not create a Path from a null string
      	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159)
      	at org.apache.hadoop.fs.Path.<init>(Path.java:175)
      	at org.apache.hadoop.fs.Path.<init>(Path.java:110)
      	at org.apache.gobblin.runtime.FsDatasetStateStore.sanitizeDatasetStatestoreNameFromDatasetURN(FsDatasetStateStore.java:175)
      	at org.apache.gobblin.runtime.FsDatasetStateStore.persistDatasetState(FsDatasetStateStore.java:386)
      	at org.apache.gobblin.runtime.FsDatasetStateStore.persistDatasetState(FsDatasetStateStore.java:90)
      	at org.apache.gobblin.runtime.SafeDatasetCommit.persistDatasetState(SafeDatasetCommit.java:418)
      	at org.apache.gobblin.runtime.SafeDatasetCommit.call(SafeDatasetCommit.java:191)
      	... 8 more
      

      Attachments

        Activity

          People

            abti Abhishek Tiwari
            kipk Kip Kohn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 50m
                1h 50m