Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16044

input_file_name() returns empty strings in data sources based on NewHadoopRDD.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 1.6.3, 2.0.0
    • SQL
    • None

    Description

      The issue is, input_file_name() function does not contain file paths when data sources use NewHadoopRDD. This is currently only supported for FileScanRDD and HadoopRDD.

      To be clear, this does not affect Spark's internal data sources because currently they all do not use NewHadoopRDD.

      However, there are several datasources using this. For example,

      spark-redshift - here
      spark-xml - here

      Currently, using this functions shows the output below:

      +-----------------+
      |input_file_name()|
      +-----------------+
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      |                 |
      +-----------------+
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hyukjin.kwon Hyukjin Kwon
            hyukjin.kwon Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment