Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21402

Fix java array of structs deserialization

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.1
    • Fix Version/s: 2.2.3, 2.3.3, 2.4.0
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      mac os
      spark 2.1.1
      Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121

      Description

      I have the following schema in a dataset -

      root

      – userId: string (nullable = true)
      – data: map (nullable = true)
        – key: string
        – value: struct (valueContainsNull = true)
          – startTime: long (nullable = true)
          – endTime: long (nullable = true)
      – offset: long (nullable = true)

      And I have the following classes (+ setter and getters which I omitted for simplicity) -

      public class MyClass {
      
          private String userId;
      
          private Map<String, MyDTO> data;
      
          private Long offset;
       }
      
      public class MyDTO {
      
          private long startTime;
          private long endTime;
      
      }
      

      I collect the result the following way -

              Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class);
              Dataset<MyClass> results = raw_df.as(myClassEncoder);
              List<MyClass> lst = results.collectAsList();
      
      

      I do several calculations to get the result I want and the result is correct all through the way before I collect it.
      This is the result for -

      results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
      
      
      data[2017-07-01].startTime data[2017-07-01].endTime

      -----------------------------------------+

      1498854000 1498870800

      This is the result after collecting the reuslts for -

      MyClass userData = results.collectAsList().get(0);
      MyDTO userDTO = userData.getData().get("2017-07-01");
      System.out.println("userDTO startTime: " + userDTO.getStartTime());
      System.out.println("userDTO endTime: " + userDTO.getEndTime());
      
      


      data startTime: 1498870800
      data endTime: 1498854000

      I tend to believe it is a spark issue. Would love any suggestions on how to bypass it.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vofque Vladimir Kuriatkov
                Reporter:
                tomron Tom
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: