Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1958

Forced UTF8 encoding of BYTE_ARRAY on stream::read/write

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • cpp-1.5.0
    • None
    • parquet-cpp
    • None

    Description

      StreamReader& StreamReader::operator>>(optional<std::string>& v) {
       CheckColumn(Type::BYTE_ARRAY, ConvertedType::UTF8);
       ByteArray ba;

       

      StreamWriter& StreamWriter::WriteVariableLength(const char* data_ptr,
       std::size_t data_len) {
       CheckColumn(Type::BYTE_ARRAY, ConvertedType::UTF8);

       

      Though the C++ Parquet::Schema::Node allows physical type of BYTE_ARRAY with ConvertedType=NONE, the stream reader/writer classes throw when ConvertedType != UTF8.

      std::string is, unfortunately, the canonical byte buffer class in C++.

      A simple approach might be to create >>parquet::ByteArray.. with columnCheck(BYTE_ARRAY, NONE), and let the user take it from there.  that would use the existing methods that >>std::string uses.. just an idea.

      I am new to this forum, and have assigned MAJOR to this bug, but gladly defer to those who have a better grasp of classification.

      Attachments

        Activity

          People

            Unassigned Unassigned
            protalis ian
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: