Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16044

input_file_name() returns empty strings in data sources based on NewHadoopRDD.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 1.6.3, 2.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      The issue is, input_file_name() function does not contain file paths when data sources use NewHadoopRDD. This is currently only supported for FileScanRDD and HadoopRDD.

      To be clear, this does not affect Spark's internal data sources because currently they all do not use NewHadoopRDD.

      However, there are several datasources using this. For example,

      spark-redshift - here
      spark-xml - here

      Currently, using this functions shows the output below:

      +-----------------+
      |input_file_name()|
      +-----------------+
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      +-----------------+
      

        Attachments

          Activity

            People

            • Assignee:
              hyukjin.kwon Hyukjin Kwon
              Reporter:
              hyukjin.kwon Hyukjin Kwon
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: