Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11282

Very strange broadcast join behaviour

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 1.5.1
    • Fix Version/s: None
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      Hi,
      I found very strange broadcast join behaviour.

      According to this Jira https://issues.apache.org/jira/browse/SPARK-10577
      I'm using hint for broadcast join. (I patched 1.5.1 with https://github.com/apache/spark/pull/8801/files )

      I found that working of this feature depends on Executor Memory.
      In my case broadcast join is working up to 31G.

      Example:

      spark1:~/ab$ ~/spark/bin/spark-submit --executor-memory 31G debug_broadcast_join.py true
      Creating test tables...
      Joining tables...
      Joined table schema:
      root
       |-- id: long (nullable = true)
       |-- val: long (nullable = true)
       |-- id2: long (nullable = true)
       |-- val2: long (nullable = true)
      
      Selecting data for id = 5...
      [Row(id=5, val=5, id2=5, val2=5)]
      spark$ ~/spark/bin/spark-submit --executor-memory 32G debug_broadcast_join.py true
      Creating test tables...
      Joining tables...
      Joined table schema:
      root
       |-- id: long (nullable = true)
       |-- val: long (nullable = true)
       |-- id2: long (nullable = true)
       |-- val2: long (nullable = true)
      
      Selecting data for id = 5...
      [Row(id=5, val=5, id2=None, val2=None)]
      

      Please find example code attached.

        Attachments

        1. SPARK-11282.py
          1 kB
          Maciej Bryński

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                maver1ck Maciej Bryński
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: