Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10246

Join in PySpark using a list of column names

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • PySpark, SQL
    • None

    Description

      Currently, there are two supported methods to perform a join: join condition and one column name.

      The documentation specifies that the join function can accept a list of conditions or a list of column names but neither are currently supported. This is discussed in issue SPARK-7197 as well.

      Functionality should match the documentation which currently contains an example in /spark/python/pyspark/sql/dataframe.py line 560:

      >>> df.join(df4, ['name', 'age']).select(df.name, df.age).collect()
      [Row(name=u'Bob', age=5)]
      """

      Attachments

        Activity

          People

            Unassigned Unassigned
            michal.monselise Michal Monselise
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: