Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3745

Corrupt encoded values in parquet files can cause crashes

    Details

      Description

      The parquet scanner does not validate values read in many cases, so it has a number of failure modes.

      We don't validate that the string length is non-negative and fits within the buffer:

      template<>
      inline int ParquetPlainEncoder::Decode(
          uint8_t* buffer, int fixed_len_size, StringValue* v) {
        memcpy(&v->len, buffer, sizeof(int32_t));
        v->ptr = reinterpret_cast<char*>(buffer) + sizeof(int32_t);
        int bytesize = ByteSize(*v);
        if (fixed_len_size > 0) v->len = std::min(v->len, fixed_len_size);
        // we still read bytesize bytes, even if we truncate
        return bytesize;
      }
      

      In general, we don't validate when calling ParquetPlainEncoder::Decode() that we're staying within the data page.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarmstrong Tim Armstrong
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: