Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28092

Spark cannot load files with COLON(:) char if not specified full path

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.4.3
    • None
    • Spark Core
    • None
    • Cloudera 6.2

      Spark latest parcel (I think 2.4.3)

    Description

      Scenario:

      I have CSV files in S3 bucket like this:

      s3a://bucket/prefix/myfile_2019:04:05.csv

      s3a://bucket/prefix/myfile_2019:04:06.csv

      Now when I try to load files with something like:
      df = spark.read.load("s3://bucket/prefix/*", format="csv", sep=":", inferSchema="true", header="true")
       
      It fails on error about URI (sorry don't have here exact exception), but when I list all files from S3 and provide path like array:
      df = spark.read.load(path=["s3://bucket/prefix/myfile_2019:04:05.csv","s3://bucket/prefix/myfile_2019:04:05.csv"], format="csv", sep=":", inferSchema="true", header="true")
       
      It works, the reason is COLON character in the name of files as per my observations.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              archenroot Ladislav Jech
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: