Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18228

AWS Error SLOW_DOWN during PutObject operation

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Information Provided
    • 10.0.0
    • None
    • None
    • None

    Description

      We use Dask to parallelise read/write operations and pyarrow to write dataset from worker nodes.

      After pyarrow released version 10.0.0, our data flows automatically switched to the latest version and some of them started to fail with the following error:

      File "/usr/local/lib/python3.10/dist-packages/org/store/storage.py", line 768, in _write_partition
          ds.write_dataset(
        File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line 988, in write_dataset
          _filesystemdataset_write(
        File "pyarrow/_dataset.pyx", line 2859, in pyarrow._dataset._filesystemdataset_write
          check_status(CFileSystemDataset.Write(c_options, c_scanner))
        File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
          raise IOError(message)
      OSError: When creating key 'equities.us.level2.by_security/' in bucket 'org-prod': AWS Error SLOW_DOWN during PutObject operation: Please reduce your request rate. 

      In total flow failed many times: most failed with the error above, but one failed with:

      File "/usr/local/lib/python3.10/dist-packages/chronos/store/storage.py", line 857, in _load_partition
          table = ds.dataset(
        File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line 752, in dataset
          return _filesystem_dataset(source, **kwargs)
        File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line 444, in _filesystem_dataset
          fs, paths_or_selector = _ensure_single_source(source, filesystem)
        File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line 411, in _ensure_single_source
          file_info = filesystem.get_file_info(path)
        File "pyarrow/_fs.pyx", line 564, in pyarrow._fs.FileSystem.get_file_info
          info = GetResultValue(self.fs.GetFileInfo(path))
        File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
          return check_status(status)
        File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
          raise IOError(message)
      OSError: When getting information for key 'ns/date=2022-10-31/channel=4/feed=A/9f41f928eedc431ca695a7ffe5fc60c2-0.parquet' in bucket 'org-poc': AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 28, Timeout was reached 

       

      Do you have any idea what was changed for dataset write between 9.0.0 and 10.0.0 to help us to fix the issue?

      Attachments

        Activity

          People

            Unassigned Unassigned
            dytyniak Vadym Dytyniak
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment