Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5387

Excessive logging to INFO and ERROR files when reading S3 data

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:
      None
    • Epic Color:
      ghx-label-1

      Description

      While querying data in S3 the impalad.ERROR file is flooded with the messages like the one below, are those expected?

      UnsupportedOperationException: Byte-buffer read unsupported by input streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream
      	at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
      readDirect: FSDataInputStream#read error:
      UnsupportedOperationException: Byte-buffer read unsupported by input streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream
      	at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
      

      Same thing for impalad.INFO

      W0529 13:48:46.398718  9771 S3AbortableInputStream.java:163] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
      W0529 13:48:46.420027  9767 S3AbortableInputStream.java:163] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
      W0529 13:48:46.429368  9758 S3AbortableInputStream.java:163] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
      

        Activity

        Hide
        stevel@apache.org Steve Loughran added a comment -

        to the problem that S3A input stream needs to support ByteBufferReadable

        That seems to imply it's S3A's problem. I'd argue: Impala is making an assumption about the features of filesystems which are not broadly supported (S3A, swift, azure, adl) & then suprised when it discovers that this assumption does not hold. At the very least, why not check once, log if it's not there and remember not to ask. Or at least: only log once.

        Show
        stevel@apache.org Steve Loughran added a comment - to the problem that S3A input stream needs to support ByteBufferReadable That seems to imply it's S3A's problem. I'd argue: Impala is making an assumption about the features of filesystems which are not broadly supported (S3A, swift, azure, adl) & then suprised when it discovers that this assumption does not hold. At the very least, why not check once, log if it's not there and remember not to ask. Or at least: only log once.
        Hide
        pranay_singh Pranay Singh added a comment -

        This is a duplicate of HADOOP-14603 which points to the problem that S3A input stream needs to support ByteBufferReadable, which is causing the error messages to be logged profusely.

        Show
        pranay_singh Pranay Singh added a comment - This is a duplicate of HADOOP-14603 which points to the problem that S3A input stream needs to support ByteBufferReadable, which is causing the error messages to be logged profusely.
        Hide
        stevel@apache.org Steve Loughran added a comment -

        Two logs are unrelated.

        First log. You are calling the ByteBufferReadable APIs and S3A doesn't implement them. Fix #1: don't do that. Fix #2, supply a patch for HADOOP-14603, with tests.

        Second log is related to changes in the AWS SDK 1.11 over 1.10, as covered in https://github.com/aws/aws-sdk-java/issues/1211

        This is partially mitigated by HADOOP-14596, which is included in all shipped releases of Hadoop built against AWS SDK 1.11. You need that patch on whatever you are using, as well as a version of the AWS SDK with the fix for github issue #1211 in.

        Show
        stevel@apache.org Steve Loughran added a comment - Two logs are unrelated. First log. You are calling the ByteBufferReadable APIs and S3A doesn't implement them. Fix #1: don't do that. Fix #2, supply a patch for HADOOP-14603 , with tests. Second log is related to changes in the AWS SDK 1.11 over 1.10, as covered in https://github.com/aws/aws-sdk-java/issues/1211 This is partially mitigated by HADOOP-14596 , which is included in all shipped releases of Hadoop built against AWS SDK 1.11. You need that patch on whatever you are using, as well as a version of the AWS SDK with the fix for github issue #1211 in.
        Hide
        pranay_singh Pranay Singh added a comment -

        Analysis
        -----------
        The error messages are being logged by the Hdfs process, it seems to be coming from
        hdfsOpenFileImpl( ) when Hdfs tries to check if readDirect() for a given file is possible.
        The readDirect() encounters an exception as shown below;

        Error message logged
        -----------------------------
        readDirect: FSDataInputStream#read error:
        UnsupportedOperationException: Byte-buffer read unsupported by input streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream
        ▸ at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)

        Reference code : http://github.mtv.cloudera.com/CDH/hadoop/blob/cdh5-2.6.0/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c#L1173

        Since this issue does not happen during every read, it should not be that expensive. However, if there are large number of files, excessive logging may have performance issues but in most cases it may be a red-herring.Hence, Hdfs Team should look into this issue to reduce this error logging as it seems to be quite frequent in a S3 set up when Impala is used.

        Show
        pranay_singh Pranay Singh added a comment - Analysis ----------- The error messages are being logged by the Hdfs process, it seems to be coming from hdfsOpenFileImpl( ) when Hdfs tries to check if readDirect() for a given file is possible. The readDirect() encounters an exception as shown below; Error message logged ----------------------------- readDirect: FSDataInputStream#read error: UnsupportedOperationException: Byte-buffer read unsupported by input streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream ▸ at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150) Reference code : http://github.mtv.cloudera.com/CDH/hadoop/blob/cdh5-2.6.0/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c#L1173 Since this issue does not happen during every read, it should not be that expensive. However, if there are large number of files, excessive logging may have performance issues but in most cases it may be a red-herring.Hence, Hdfs Team should look into this issue to reduce this error logging as it seems to be quite frequent in a S3 set up when Impala is used.

          People

          • Assignee:
            pranay_singh Pranay Singh
            Reporter:
            mmokhtar Mostafa Mokhtar
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development