Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-2026

Retention Job should fail on OOM

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • misc
    • None

    Description

      Currently, while cleaning the log files, the Retention job goes into OOM and silently fails when the no of log files is too many. Workflow execution even after failure says Success.

      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - java.lang.OutOfMemoryError: GC overhead limit exceeded
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at java.util.Arrays.copyOf(Arrays.java:3332)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at java.lang.StringBuffer.append(StringBuffer.java:270)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at java.net.URI.appendSchemeSpecificPart(URI.java:1911)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at java.net.URI.toString(URI.java:1941)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at java.net.URI.<init>(URI.java:742)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.Path.makeQualified(Path.java:562)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:271)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:997)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:121)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1050)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1047)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1057)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.InstrumentedFileSystem.lambda$listStatus$17(InstrumentedFileSystem.java:379)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.InstrumentedFileSystem$$Lambda$69/231154485.get(Unknown Source)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at com.linkedin.hadoop.metrics.fs.PerformanceTrackingFileSystem.process(PerformanceTrackingFileSystem.java:412)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.InstrumentedFileSystem.process(InstrumentedFileSystem.java:100)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.InstrumentedFileSystem.listStatus(InstrumentedFileSystem.java:379)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.PerformanceTrackingDistributedFileSystem.listStatus(PerformanceTrackingDistributedFileSystem.java:296)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:258)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.viewfs.ChRootedFileSystem.listStatus(ChRootedFileSystem.java:253)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.viewfs.ViewFileSystem.listStatus(ViewFileSystem.java:528)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.GridFilesystem.lambda$listStatus$4(GridFilesystem.java:491)
      21-03-2024 01:00:03 PDT jobs-kafkaetl-gobblin-streaming-logs-cleaner INFO - 	at org.apache.hadoop.fs.GridFilesystem$$Lambda$68/2109027988.doCall(Unknown Source) 

      As the job silently fails, user doesn't get to know explicitly about it. Hence, when going into OOM, retention job should explicitly fail if it can't be proceeded further

      Attachments

        Activity

          People

            Unassigned Unassigned
            arpit.varshney Arpit Varshney
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h