Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1350

Error in decoding enums using ResolvingDecoder

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • 1.7.4
    • None
    • c++
    • None

    Description

      We can't get a correct result when decoding enums using resolving decoder. e.g.

      schema
      {
        "type" : "record",
        "name" : "TestEnum",
        "fields" : [
      	{
      	    "name" : "MyMode",
      	    "type" : {
      	      "type" : "enum",
      	      "name" : "Mode",
      	      "symbols" : [ "MEMORY", "DISK" ]
      	    }
      	}
        ]
      }
      

      We encoded "DISK"(1), then decoded with resolving decoder, got "MEMORY"(0).
      I examined the code and found that there is a sort after reading names of reader.
      I could't quite understand the author's intention, but it really can not work well.
      When decoding my enum, the return value is actually the position of the sorted names, and obviously it's not correct.

      Symbol.cc
      Symbol Symbol::enumAdjustSymbol(const NodePtr& writer, const NodePtr& reader)
      {
          vector<string> rs;
          size_t rc = reader->names();
          for (size_t i = 0; i < rc; ++i) {
              rs.push_back(reader->nameAt(i));
          }
          sort(rs.begin(), rs.end()); // the strange sort
      

      Here is my complete test case.

      generated structure
      enum Mode {
          MEMORY,
          DISK,
      };
      
      struct TestEnum {
          Mode MyMode;
      };
      
      My test case
      #include "ts_enum.h"
      #include "avro/Compiler.hh"
      #include "avro/ValidSchema.hh"
      
      using namespace std;
      using namespace avro;
      using namespace enum_test;
      
      static const char ts_schema_string[] =
              "{ \"type\" : \"record\", \"name\" : \"TestEnum\", \"fields\" : "
              "[ { \"name\" : \"MyMode\", \"type\" : "
              "{ \"type\" : \"enum\", \"name\" : \"Mode\", "
              "\"symbols\" : [ \"MEMORY\", \"DISK\" ] } } ]}";
      
      int main(int argc, char * argv[]) {
          TestEnum te1, te2;
          ValidSchema reader = compileJsonSchemaFromString(ts_schema_string);
          ValidSchema writer = compileJsonSchemaFromString(ts_schema_string);
      
          //encode TestEnum
          auto_ptr<OutputStream> out_stream = memoryOutputStream();
          EncoderPtr encoder = binaryEncoder();
          encoder->init(*out_stream);
          te1.MyMode = DISK;
          encode(*encoder, te1);
          encoder->flush();
      
          //decode TestEnum
          auto_ptr<InputStream> in_stream = memoryInputStream(*out_stream);
          DecoderPtr decoder = resolvingDecoder(writer, reader, avro::binaryDecoder());
          decoder->init(*in_stream);
          decode(*decoder, te2);
      
          cout<<"TE1: "<<te1.MyMode << " | TE2: "<<te2.MyMode<<endl;
          return 0;
      }
      

      The result
      -------------------
      TE1: 1 | TE2: 0

      I debuged into avro code.
      In Symbol::enumAdjustSymbol, there is a vector<string> of reader's enum names, and after the sort, "MEMOEY, DISK" turned to be "DISK, MEMORY".
      At last, a vector<int> of writer's enum names saved every position of the sorted vector<string>.
      As a result, in the returned symbol, MEMORY's position is 1 and DISK's position is 0.
      Finally, when we decoding the enum, the position is returned to the target object.
      I could't quite understand the author's intention here but when I commented the sort, everything worked well.

      Attachments

        Activity

          People

            thiru_mg Thiruvalluvan M. G.
            keyer Bin Guo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: