Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4056

Avro deserialization corrupts data

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0
    • Component/s: Storage - Other
    • Labels:
      None
    • Environment:

      Ubuntu 15.04 - Oracle Java

    • Flags:
      Important

      Description

      I have an Avro file that support the following data/schema:
      {"field":"some", "classification":{"variant":"Gæst"}}

      When I select 10 rows from this file I get:
      ---------------------

      EXPR$0

      ---------------------

      Gæst
      Voksen
      Voksen
      Invitation KIF KBH
      Invitation KIF KBH
      Ordinarie pris KBH
      Ordinarie pris KBH
      Biljetter 200 krBH
      Biljetter 200 krBH
      Biljetter 200 krBH

      ---------------------

      The bug is that the field values are incorrectly de-serialized and the value from the previous row is retained if the subsequent row is shorter.

      The sql query:
      "select s.classification.variant variant from dfs.<some> as s limit 10;"

      That way the "Ordinarie pris" becomes "Ordinarie pris KBH" because the previous row had the value "Invitation KIF KBH".

        Attachments

        1. test.zip
          5 kB
          Stefán Baxter

          Activity

            People

            • Assignee:
              jaltekruse Jason Altekruse
              Reporter:
              acmeguy Stefán Baxter
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: