Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16942

S3A creating folder level delete markers

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.8.3, 3.2.1
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      Using S3A URL scheme while writing out data from Spark to S3 is creating many folder level delete markers.

      Writing the same with S3 URL scheme, does not create any delete markers at all.

       

      Spark - 2.4.4

      Hadoop - 3.2.1

      EMR version - 6.0.0

      Write Mode - Append

      [hadoop@ip-192-0-161-212 ~]$ spark-shell
      Setting default log level to "WARN".
      To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
      20/03/27 07:37:19 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
      Spark context Web UI available at http://ip-192-0-161-212.ec2.internal:4040
      Spark context available as 'sc' (master = yarn, app id = application_1585294390030_0003).
      Spark session available as 'spark'.
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
            /_/
               
      Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
      Type in expressions to have them evaluated.
      Type :help for more information.
      
      scala> val df = spark.sql("select 1 as a")
      df: org.apache.spark.sql.DataFrame = [a: int]
      
      scala> df.write.mode(org.apache.spark.sql.SaveMode.Append).save("s3://my-bucket/tmp/vijayant/test/s3/")
                                                                                      
      scala> df.write.mode(org.apache.spark.sql.SaveMode.Append).save("s3a://my-bucket/tmp/vijayant/test/s3a/")
                                                                                      
      scala> 
      

      Getting delete markers from `s3` write

      aws s3api list-object-versions --bucket my-bucket --prefix tmp/vijayant/test/s3/
      {
          "Versions": [
              {
                  "LastModified": "2020-03-27T07:38:17.000Z", 
                  "VersionId": "V06OzeE7j221Tq7keSGj8bveCYyJFIcf", 
                  "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3/_SUCCESS", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "Size": 0
              }, 
              {
                  "LastModified": "2020-03-27T07:38:16.000Z", 
                  "VersionId": "dLYtHDugLhFIdw2YHLFmoFOxXkm.21Wo", 
                  "ETag": "\"26e70a1e26c709e3e8498acd49cfaaa3-1\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3/part-00000-9d9a8925-f119-415d-b547-b742396e2ca7-c000.snappy.parquet", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "Size": 384
              }
          ]
      } 
      

      Getting delete markers from `s3a` write

      aws s3api list-object-versions --bucket my-bucket --prefix tmp/vijayant/test/s3a/
      
      {
          "DeleteMarkers": [
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "NJWRZMcb_eYYwCJh_isX4H1Ox6W362Wb", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "F0h0mLcVVwkMtcHxd95Hj7BACL4Up_Q9", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:10.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": ".sBcE6cXeggekOnSgZ4n7pyCDHnsLERK", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:10.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "nzm39jiUPC4H0ZaS.5Shp0FYPnR8wNf9", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "BPM65R1HkZngPDYtDL3zPZYPw_G_m9Ic", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:08.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "LJt8_MVDOiD4UdgUqEMycxjvtinJlTNt", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "RqunJTn8Od0PgFR4yu44PX4kL54k6EDv", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "4vY8cnqUI5VJAk3VfEt_VD_KEczo3bmY", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/", 
                  "LastModified": "2020-03-27T07:39:08.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "ln47YYy.yiE.k70cvqvfgYCEQoYFnKQW", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "5Bsrt7s1caM90mzGNgk0MsTU9q8UjTTA", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "pN3HzDfnmqIqrMwAL2jqKEBkvoHZALor", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "wg91poO1KXReXxvsZHzZXrHR1IgIX8t2", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "cv5Noykq3sMilQqJXAH3E.N7qAWnIBx7", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "6xzt9SxlCUJaOLD8krkE3yXfQU14rErX", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "wGmJAo7x_gkLWAiHzxPGdPMVSus7Wcp1", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet", 
                  "LastModified": "2020-03-27T07:39:10.000Z"
              }
          ], 
          "Versions": [
              {
                  "LastModified": "2020-03-27T07:39:11.000Z", 
                  "VersionId": "2py_ZXKl7yh6fwhzksAx8Os1BriDJCBb", 
                  "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/_SUCCESS", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "Size": 0
              }, 
              {
                  "LastModified": "2020-03-27T07:39:08.000Z", 
                  "VersionId": "lDqTnLCqDYtjrOiY.V7E6AKTRQLKrqUT", 
                  "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "Size": 0
              }, 
              {
                  "LastModified": "2020-03-27T07:39:10.000Z", 
                  "VersionId": "g.rGoTDdmrGrNjrLchvwz3jMmGePkgiD", 
                  "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "Size": 0
              }, 
              {
                  "LastModified": "2020-03-27T07:39:09.000Z", 
                  "VersionId": ".ZCpY2UW4hRlbLL87dFUJRuk021Hyq8p", 
                  "ETag": "\"3def7238a0858c17c62d7045290175cf\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "Size": 384
              }, 
              {
                  "LastModified": "2020-03-27T07:39:10.000Z", 
                  "VersionId": "JSNjTDHSQqe9zSAV93bc6TXPuqA.vDJE", 
                  "ETag": "\"3def7238a0858c17c62d7045290175cf\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "Size": 384
              }
          ]
      }
      
      

      This in turn makes listing objects slow and we have even noticed timeouts due to too many delete markers.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Vijayant vijayant soni
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: