Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3745

Corrupt encoded values in parquet files can cause crashes

    XMLWordPrintableJSON

Details

    Description

      The parquet scanner does not validate values read in many cases, so it has a number of failure modes.

      We don't validate that the string length is non-negative and fits within the buffer:

      template<>
      inline int ParquetPlainEncoder::Decode(
          uint8_t* buffer, int fixed_len_size, StringValue* v) {
        memcpy(&v->len, buffer, sizeof(int32_t));
        v->ptr = reinterpret_cast<char*>(buffer) + sizeof(int32_t);
        int bytesize = ByteSize(*v);
        if (fixed_len_size > 0) v->len = std::min(v->len, fixed_len_size);
        // we still read bytesize bytes, even if we truncate
        return bytesize;
      }
      

      In general, we don't validate when calling ParquetPlainEncoder::Decode() that we're staying within the data page.

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: