Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2454

Avro Import/Export does not round trip

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When exporting to Avro columns with type Byte or Short are treated as Integers because Avro doesn't have a Byte or Short type. When re-importing the data, the job fails because the column types do not match.

      Ideally spark-avro would solve this by safely casting the values back to the smaller type. Guava has utilities to make this straightforward. (ex. Shorts.checkedCast). We could send a pull request to spark-avro to fix this, or add some special handling to the Kudu side to handle the safe downconversion. 

      Another type issue when exporting is that Decimal values are written as Strings instead of BigDecimal logical types. There are a few un-merged pull request to fix that here: 

      Additionally Timestamp values are written as longs instead of Timestamp logical types (timestamp-micros). This is a data corruption issue because the long value that is output is in milliseconds (Timestamp.getTime()) but the expected long value for a Kudu Timestamp column should be in microseconds.

      Given all these issues, ImportExportFiles needs a lot more test coverage before we suggest it's use. Currently it only tests importing Strings form a CSV and does not test Avro or parquet support.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              granthenke Grant Henke
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: