Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13807

Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8

    XMLWordPrintableJSON

Details

    Description

      On Flink release-1.8 branch:

      [ERROR] Tests run: 12, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 4.81 s <<< FAILURE! - in org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest
      [ERROR] testSimpleAvroRead[Execution mode = CLUSTER](org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest)  Time elapsed: 0.438 s  <<< FAILURE!
      java.lang.AssertionError: 
      Different elements in arrays: expected 2 elements and received 2
      files: [/tmp/junit5386344396421857812/junit6023978980792200274.tmp/4, /tmp/junit5386344396421857812/junit6023978980792200274.tmp/2, /tmp/junit5386344396421857812/junit6023978980792200274.tmp/1, /tmp/junit5386344396421857812/junit6023978980792200274.tmp/3]
       expected: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], "type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, "type_bool_test": false, "type_array_string": [], "type_array_boolean": [], "type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, -48]}]
       received: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], "type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, "type_bool_test": false, "type_array_string": [], "type_array_boolean": [], "type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, -48]}]
      	at org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest.after(AvroTypeExtractionTest.java:76)
      

      Comparing “expected” with “received”, there is really some question mark difference.

      For example, in “expected’, it’s

      "type_decimal_bytes": {"bytes": "\u0007?”}
      

      While in “received”, it’s

      "type_decimal_bytes": {"bytes": "\u0007??"}
      

      The environment I ran the unit tests on uses ANSI_X3.4-1968

      I changed to "en_US.UTF-8" and the unit tests passed.

      Attachments

        1. patch.diff
          1 kB
          Zili Chen

        Issue Links

          Activity

            People

              tison Zili Chen
              ethanli Ethan Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m