Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
A customer reported seeing:
Error: java.io.IOException: Task failed: java.lang.IllegalArgumentException: The value of property <<redacted>> must not be null at com.google.common.base.Preconditions.checkArgument(Preconditions.java:146) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1260) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1241) at org.apache.gobblin.util.JobConfigurationUtils.putStateIntoConfiguration(JobConfigurationUtils.java:95) at org.apache.gobblin.writer.FsDataWriter.<init>(FsDataWriter.java:102) at org.apache.gobblin.writer.GobblinBaseOrcWriter.<init (GobblinBaseOrcWriter.java:65) at org.apache.gobblin.writer.GobblinOrcWriter.<init>(GobblinOrcWriter.java:42) at <<redacted>> at org.apache.gobblin.writer.PartitionedDataWriter$4.get(PartitionedDataWriter.java:230) at org.apache.gobblin.writer.PartitionedDataWriter$4.get(PartitionedDataWriter.java:225) at org.apache.gobblin.writer.CloseOnFlushWriterWrapper.<init>(CloseOnFlushWriterWrapper.java:73) at org.apache.gobblin.writer.PartitionedDataWriter.<init>(PartitionedDataWriter.java:224) at org.apache.gobblin.runtime.fork.Fork.buildWriter(Fork.java:571) at org.apache.gobblin.runtime.fork.Fork.buildWriterIfNotPresent(Fork.java:579) at org.apache.gobblin.runtime.fork.Fork.processRecord(Fork.java:525) at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:103) at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:86) at org.apache.gobblin.runtime.fork.Fork.run(Fork.java:257) at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) (Gobblin task id <<redacted>>, container id attempt_1690893552521_3376012_m_000111_0) at org.apache.gobblin.runtime.GobblinMultiTaskAttempt.persistTaskStateStore(GobblinMultiTaskAttempt.java:367) ...
the appears to arise from concurrent modification to the `State`'s underlying `Properties` (i.e. between the time the `keySet()` is first read and when each value is accessed from the same `Properties`).
although the customer's impl seems to warrant synchronization, given that a null-value is certain to be rejected by `o.a.hadoop.conf.Configuration`, defensively filter those out ahead of time.