Avro
  1. Avro
  2. AVRO-1521

Inconsistent behavior of Perl API with 'boolean' type

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: perl
    • Labels:
      None
    • Release Note:
      Perl API: Only accept 0 and 1 as boolean values, fix encoding

      Description

      The perl boolean serialization code in BinaryEncoder.pm encodes anything false to perl, such as 0, '0', '', () and undef, as false, and anything true to perl, which is literally everything else, as true.

      Inconsistent with the above serialization, the code used in Schema.pm to determine which union branch to use, is checking for boolean-ness with:

      m{yes|no|y|n|t|f|true|false}i
      

      meaning only those particular strings are considered booleans.

      So all those values, including 'no' 'n' 'f' and 'false', still get serialized to true.

      We could just standardize on one of the two and use it consistently. But neither works that well in unions, because unless you put the boolean type last in the union definition, a wide variety of data will be downcast to boolean type.

      Perl has no built-in or standardized boolean type, so there's no solution like we have in the other language Avro APIs. But we could do as the perl JSON module does, and define objects for true and false.

        Issue Links

          Activity

          John Karp made changes -
          Fix Version/s 1.8.0 [ 12323299 ]
          John Karp made changes -
          Description h1. Boolean Serialization
          The boolean serialization code in BinaryEncoder.pm is:
          {noformat}
          $data ? \0x1 : \0x0
          {noformat}
          intending that anything false to perl, such as 0, '0', '', () and undef are encoded as zero, and everything else is encoded as one. However, this code doesn't work, as these unit tests would indicate:
          {noformat}
          primitive_ok boolean => 0, "\x0";
          primitive_ok boolean => 1, "\x1";
          {noformat}
          which print:
          {noformat}
          # Failed test 'primitive boolean encoded correctly'
          # at t/02_bin_encode.t line 40.
          # got: '30'
          # expected: '00'

          # Failed test 'primitive boolean encoded correctly'
          # at t/02_bin_encode.t line 40.
          # got: '31'
          # expected: '01'
          {noformat}

          h1. Booleans in Unions
          Inconsistent with the above serialization, the code used in Schema.pm to determine which union branch to use, is attempting to check for boolean-ness with:
          {noformat}
          m{yes|no|y|n|t|f|true|false}i
          {noformat}
          meaning only those particular strings are considered booleans, however they will all get encoded as '0' by BinaryEncoder.pm.

          I say 'attempts' because its actually matching this regex against the data type name $type, which in this context will always be 'boolean', instead of of the value $data.

          h1. Suggested Fix
          Perl has no boolean type, so there's no ideal solution for the inconsistency. But we could keep it simple, and have only the numbers 0 and 1 accepted as boolean values.
          The perl boolean serialization code in BinaryEncoder.pm encodes anything false to perl, such as 0, '0', '', () and undef, as false, and anything true to perl, which is literally everything else, as true.

          Inconsistent with the above serialization, the code used in Schema.pm to determine which union branch to use, is checking for boolean-ness with:
          {noformat}
          m{yes|no|y|n|t|f|true|false}i
          {noformat}
          meaning only those particular strings are considered booleans.

          So all those values, including 'no' 'n' 'f' and 'false', still get serialized to true.

          We could just standardize on one of the two and use it consistently. But neither works that well in unions, because unless you put the boolean type last in the union definition, a wide variety of data will be downcast to boolean type.

          Perl has no built-in or standardized boolean type, so there's no solution like we have in the other language Avro APIs. But we could do as the perl JSON module does, and define objects for true and false.
          John Karp made changes -
          Field Original Value New Value
          Link This issue is cloned as AVRO-1470 [ AVRO-1470 ]
          John Karp created issue -

            People

            • Assignee:
              John Karp
              Reporter:
              John Karp
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development