Details

    • Incompatible change
    • S3 Select is no longer supported through the S3A connector

    Description

      getting s3 select to work with the v2 sdk is tricky, we need to add extra libraries to the classpath beyond just bundle.jar. we can do this but

      • AFAIK nobody has ever done CSV predicate pushdown, as it breaks split logic completely
      • CSV is a bad format
      • one-line JSON more structured but also way less efficient
        ORC/Parquet benefit from vectored IO and work spanning the cluster.

      accordingly, I'm wondering what to do about s3 select

      1. cut?
      2. downgrade to optional and document the extra classes on the classpath

      Option #2 is straightforward and effectively the default. we can also declare the feature deprecated.

      
      [ERROR] testReadLandsatRecordsNoMatch(org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat)  Time elapsed: 147.958 s  <<< ERROR!
      java.io.IOException: java.lang.NoClassDefFoundError: software/amazon/eventstream/MessageDecoder
              at org.apache.hadoop.fs.s3a.select.SelectObjectContentHelper.select(SelectObjectContentHelper.java:75)
              at org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$select$10(WriteOperationHelper.java:660)
              at org.apache.hadoop.fs.store.audit.AuditingFunctions.lambda$withinAuditSpan$0(AuditingFunctions.java:62)
              at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
      
      

      Attachments

        Activity

          People

            stevel@apache.org Steve Loughran
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: