Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9564 Spark 1.5.0 Testing Plan
  3. SPARK-8670

Nested columns can't be referenced (but they can be selected)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.4.0, 1.4.1, 1.5.0
    • 1.5.0
    • PySpark, SQL
    • None

    Description

      This is strange and looks like a regression from 1.3.

      import json
      
      daterz = [
        {
          'name': 'Nick',
          'stats': {
            'age': 28
          }
        },
        {
          'name': 'George',
          'stats': {
            'age': 31
          }
        }
      ]
      
      df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x)))
      
      df.select('stats.age').show()
      df['stats.age']  # 1.4 fails on this line
      

      On 1.3 this works and yields:

      age
      28 
      31 
      Out[1]: Column<stats.age AS age#2958L>
      

      On 1.4, however, this gives an error on the last line:

      +---+
      |age|
      +---+
      | 28|
      | 31|
      +---+
      
      ---------------------------------------------------------------------------
      IndexError                                Traceback (most recent call last)
      <ipython-input-1-04bd990e94c6> in <module>()
           19 
           20 df.select('stats.age').show()
      ---> 21 df['stats.age']
      
      /path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item)
          678         if isinstance(item, basestring):
          679             if item not in self.columns:
      --> 680                 raise IndexError("no such column: %s" % item)
          681             jc = self._jdf.apply(item)
          682             return Column(jc)
      
      IndexError: no such column: stats.age
      

      This means, among other things, that you can't join DataFrames on nested columns.

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            nchammas Nicholas Chammas
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: