Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22790

add a configurable factor to describe HadoopFsRelation's size

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      as per discussion in https://github.com/apache/spark/pull/19864#discussion_r156847927

      the current HadoopFsRelation is purely based on the underlying file size which is not accurate and makes the execution vulnerable to errors like OOM

      Users can enable CBO with the functionalities in https://github.com/apache/spark/pull/19864 to avoid this issue

      This JIRA proposes to add a configurable factor to sizeInBytes method in HadoopFsRelation class so that users can mitigate this problem without CBO

        Attachments

          Activity

            People

            • Assignee:
              CodingCat Nan Zhu
              Reporter:
              CodingCat Nan Zhu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: