Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18528

Disable abfs prefetching by default

    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      ABFS block prefetching has been disabled to avoid HADOOP-18521 and buffer sharing on multithreaded processes (Hive, Spark etc). This will have little/no performance impact on queries against Parquet or ORC data, but can slow down sequential stream processing, including CSV files -however, the read data will be correct.
      It may slow down distcp downloads, where the race condition does not arise. For maximum distcp performance re-enable the readahead by setting fs.abfs.enable.readahead to true.
      Show
      ABFS block prefetching has been disabled to avoid HADOOP-18521 and buffer sharing on multithreaded processes (Hive, Spark etc). This will have little/no performance impact on queries against Parquet or ORC data, but can slow down sequential stream processing, including CSV files -however, the read data will be correct. It may slow down distcp downloads, where the race condition does not arise. For maximum distcp performance re-enable the readahead by setting fs.abfs.enable.readahead to true.

    Description

      After the addition of HADOOP-18517, we should disable readAhead by default to mitigate inconsistent read results caused by ABFS prefetching, HADOOP-18521.

      As an urgent fix: Disable readAhead/prefetch, tracked for 3.3.5.
      Long-term fix: HADOOP-18521, tracked for 3.3.6.

      Attachments

        Issue Links

          Activity

            People

              mehakmeetSingh Mehakmeet Singh
              mehakmeetSingh Mehakmeet Singh
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: