Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17180

S3Guard: Include 500 DynamoDB system errors in exponential backoff retries

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.3
    • None
    • None
    • None

    Description

      We get fatal failures from S3guard (that in turn fail our spark jobs) because of the inernal DynamoDB system errors.

      com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG): Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG)

      The DynamoDB has separate statistic for system errors:

      I contacted the AWS Support and got an explanation that those 500 errors are returned to the client once DynamoDB gets overwhelmed with client requests.

      So essentially the traffic should had been throttled but it didn't and got 500 system errors.

      My point is that the client should handle those errors just like throttling exceptions - 

      with exponential backoff retries.

       

      Here is more complete exception stack trace:

       

      org.apache.hadoop.fs.s3a.AWSServiceIOException: get on s3a://rem-spark/persisted_step_data/15/0afb1ccb73854f1fa55517a77ec7cc5e_b67e2221-f0e3-4c89-90ab-f49618ea4557_SDTopology/parquet.all_ranges/topo_id=321: com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG): Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG) at org.apache.hadoop.fs.s3a.S3AUtils.translateDynamoDBException(S3AUtils.java:389) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:181) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.get(DynamoDBMetadataStore.java:438) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2110) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1889) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$9(S3AFileSystem.java:1868) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1868) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles(InMemoryFileIndex.scala:277) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3$$anonfun$apply$2.apply(InMemoryFileIndex.scala:207) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3$$anonfun$apply$2.apply(InMemoryFileIndex.scala:206) at scala.collection.immutable.Stream.map(Stream.scala:418) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3.apply(InMemoryFileIndex.scala:206) at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$3.apply(InMemoryFileIndex.scala:204) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: com.amazonaws.services.dynamodbv2.model.InternalServerErrorException: Internal server error (Service: AmazonDynamoDBv2; Status Code: 500; Error Code: InternalServerError; Request ID: 00EBRE6J6V8UGD7040C9DUP2MNVV4KQNSO5AEMVJF66Q9ASUAAJG) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2925) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2901) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeGetItem(AmazonDynamoDBClient.java:1640) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.getItem(AmazonDynamoDBClient.java:1616) at com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.doLoadItem(GetItemImpl.java:77) at com.amazonaws.services.dynamodbv2.document.internal.GetItemImpl.getItem(GetItemImpl.java:66) at com.amazonaws.services.dynamodbv2.document.Table.getItem(Table.java:608) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.getConsistentItem(DynamoDBMetadataStore.java:423) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.innerGet(DynamoDBMetadataStore.java:459) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.lambda$get$2(DynamoDBMetadataStore.java:439) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109) ... 29 more

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              david.kats David Kats
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: