Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4056

Avro deserialization corrupts data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.0
    • 1.3.0
    • Storage - Other
    • None
    • Ubuntu 15.04 - Oracle Java

    • Important

    Description

      I have an Avro file that support the following data/schema:
      {"field":"some", "classification":{"variant":"Gæst"}}

      When I select 10 rows from this file I get:
      ---------------------

      EXPR$0

      ---------------------

      Gæst
      Voksen
      Voksen
      Invitation KIF KBH
      Invitation KIF KBH
      Ordinarie pris KBH
      Ordinarie pris KBH
      Biljetter 200 krBH
      Biljetter 200 krBH
      Biljetter 200 krBH

      ---------------------

      The bug is that the field values are incorrectly de-serialized and the value from the previous row is retained if the subsequent row is shorter.

      The sql query:
      "select s.classification.variant variant from dfs.<some> as s limit 10;"

      That way the "Ordinarie pris" becomes "Ordinarie pris KBH" because the previous row had the value "Invitation KIF KBH".

      Attachments

        1. test.zip
          5 kB
          Stefán Baxter

        Activity

          People

            jaltekruse Jason Altekruse
            acmeguy Stefán Baxter
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: