Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23395

Add an option to return an empty DataFrame from an RDD generated by a Hadoop file when there are no usable paths

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      When using file-based data from custom formats, Spark's ability to use Hadoop's FileInputFormats is very handy. However, when the path they are pointed at contains no usable data, they throw an IOException saying "No input paths specified in job".

      It would be a nice feature if the DataFrame API somehow could capture this and return an empty DataFrame instead of failing the job.

        Attachments

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment