Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-6923

OOM errors in jobServer when using GCS artifactDir

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.17.0
    • sdk-java-harness

    Description

      When starting jobServer with artifactDir pointing to a GCS bucket: 

      ./gradlew :beam-runners-flink_2.11-job-server:runShadow -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket

      and running a Java portable pipeline with the following, portability related pipeline options: 

      --runner=PortableRunner --jobEndpoint=localhost:8099 --defaultEnvironmentType=DOCKER --defaultEnvironmentConfig=gcr.io/<my-freshly-built-sdk-harness-image>/java:latest'

       

      I'm facing a series of OOM errors, like this: 

      Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: Java heap space
      at com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
      at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
      at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
      at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
      at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
      at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
      at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

       

      This does not happen when I'm using a local filesystem for the artifact staging location. 

       

      Attachments

        1. Telemetries.png
          93 kB
          Lukasz Gajowy
        2. Paths to GC root.png
          92 kB
          Lukasz Gajowy
        3. Instance counts.png
          158 kB
          Lukasz Gajowy
        4. heapdump size-sorted.png
          293 kB
          Lukasz Gajowy
        5. beam6923flink182.m4v
          37.50 MB
          Lukasz Gajowy
        6. beam6923-flink156.m4v
          40.63 MB
          Lukasz Gajowy

        Issue Links

          Activity

            People

              angoenka Ankur Goenka
              ŁukaszG Lukasz Gajowy
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h