Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27098

Flaky missing file parts when writing to Ceph without error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.0
    • None
    • Input/Output

    Description

      https://stackoverflow.com/questions/54935822/spark-s3a-write-omits-upload-part-without-failure/55031233?noredirect=1#comment96835218_55031233

      Using 2.4.0 with Hadoop 2.7, hadoop-aws 2.7.5, and the Ceph S3 endpoint. occasionally a file part will be missing; i.e. part 00003 here:

      ```
      > aws s3 ls my-bucket/folder/
      2019-02-28 13:07:21 0 _SUCCESS
      2019-02-28 13:06:58 79428651 part-00000-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:06:59 79586172 part-00001-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:00 79561910 part-00002-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:01 79192617 part-00004-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:07 79364413 part-00005-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:08 79623254 part-00006-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:10 79445030 part-00007-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:10 79474923 part-00008-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:11 79477310 part-00009-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:12 79331453 part-00010-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:13 79567600 part-00011-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:13 79388012 part-00012-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:14 79308387 part-00013-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:15 79455483 part-00014-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:17 79512342 part-00015-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:18 79403307 part-00016-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:18 79617769 part-00017-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:19 79333534 part-00018-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      2019-02-28 13:07:20 79543324 part-00019-5789ebf5-b55d-4715-8bb5-dfc5c4e4b999-c000.snappy.parquet
      ```

      However, the write succeeds and leaves a _SUCCESS file.

      This can be caught by additionally checking afterward whether the number of written file parts agrees with the number of partitions, but Spark should at least fail on its own and leave a meaningful stack trace in this case.

      Attachments

        1. sanitized_stdout_00001.txt
          14 kB
          Martin Loncaric

        Activity

          People

            Unassigned Unassigned
            mwlon Martin Loncaric
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: