Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-19353 Über-jira: S3A Hadoop 3.4.2 features
  3. HADOOP-18839

s3a client SSLException is raised after very long timeout "Unsupported or unrecognized SSL message"

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.3.4
    • None
    • fs/s3
    • None

    Description

      I've tried to connect from PySpark to Minio running in docker.

      Installing PySpark and starting Minio:

      pip install pyspark==3.4.1
      
      docker run --rm -d --hostname minio --name minio -p 9000:9000 -p 9001:9001 -e MINIO_ACCESS_KEY=access -e MINIO_SECRET_KEY=Eevoh2wo0ui6ech0wu8oy3feiR3eicha -e MINIO_ROOT_USER=admin -e MINIO_ROOT_PASSWORD=iepaegaigi3ofa9TaephieSo1iecaesh bitnami/minio:latest
      docker exec minio mc mb test-bucket
      

      Then create Spark session:

      from pyspark.sql import SparkSession
      
      spark = SparkSession.builder\
                .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4")\
                .config("spark.hadoop.fs.s3a.endpoint", "localhost:9000")\
                .config("spark.hadoop.fs.s3a.connection.ssl.enabled", "true")\
                .config("spark.hadoop.fs.s3a.path.style.access", "true")\
                .config("spark.hadoop.fs.s3a.access.key", "access")\
                .config("spark.hadoop.fs.s3a.secret.key", "Eevoh2wo0ui6ech0wu8oy3feiR3eicha")\
                .config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")\
                .getOrCreate()
      spark.sparkContext.setLogLevel("debug")
      

      And try to access some object in a bucket:

      import time
      
      begin = time.perf_counter()
      spark.read.format("csv").load("s3a://test-bucket/fake")
      end = time.perf_counter()
      
      py4j.protocol.Py4JJavaError: An error occurred while calling o40.load.
      : org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://test-bucket/fake: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unsupported or unrecognized SSL message: Unable to execute HTTP request: Unsupported or unrecognized SSL message
      ...
      

      ssl.log

      >>> print((end-begin)/60, "min")
      14.72387898775002 min
      

      I was waiting almost 15 minutes to get the exception from Spark. The reason was I tried to connect to endpoint with fs.s3a.connection.ssl.enabled=true, but Minio is configured to listen for HTTP protocol only.

      Is there any way to immediately raise exception if SSL connection cannot be established?

      If I try to pass wrong endpoint, like localhos:9000, I'll get exception like this in just 5 seconds:

      : org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://test-bucket/fake: com.amazonaws.SdkClientException: Unable to execute HTTP request: test-bucket.localhos: Unable to execute HTTP request: test-bucket.localhos
      ...
      

      host.log

      >>> print(end-begin, "sec")
      5.700424307000503 sec
      

      I know about options like fs.s3a.attempts.maximum and fs.s3a.retry.limit, setting them to 1 will cause raising exception just immediately. But this does not look right.

      Attachments

        1. wrong_port.log
          1.20 MB
          Maxim Martynov
        2. wrong_host.log
          1.07 MB
          Maxim Martynov
        3. host.log
          141 kB
          Maxim Martynov
        4. ssl.log
          1.27 MB
          Maxim Martynov

        Activity

          People

            Unassigned Unassigned
            dolfinus Maxim Martynov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: