Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16044

input_file_name() returns empty strings in data sources based on NewHadoopRDD.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 1.6.3, 2.0.0
    • SQL
    • None

    Description

      The issue is, input_file_name() function does not contain file paths when data sources use NewHadoopRDD. This is currently only supported for FileScanRDD and HadoopRDD.

      To be clear, this does not affect Spark's internal data sources because currently they all do not use NewHadoopRDD.

      However, there are several datasources using this. For example,

      spark-redshift - here
      spark-xml - here

      Currently, using this functions shows the output below:

      +-----------------+
      |input_file_name()|
      +-----------------+
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      +-----------------+
      

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: