Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25772

Java encoders - switch fields on collectAsList

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.1
    • 3.0.0
    • SQL
    • None
    • mac os
      spark 2.1.1
      Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121

    Description

      I have the following schema in a dataset -

      root

      – userId: string (nullable = true)
      – data: map (nullable = true)
        – key: string
        – value: struct (valueContainsNull = true)
          – startTime: long (nullable = true)
          – endTime: long (nullable = true)
      – offset: long (nullable = true)

      And I have the following classes (+ setter and getters which I omitted for simplicity) -

      public class MyClass {
      
          private String userId;
      
          private Map<String, MyDTO> data;
      
          private Long offset;
       }
      
      public class MyDTO {
      
          private long startTime;
          private long endTime;
      
      }
      

      I collect the result the following way -

              Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class);
              Dataset<MyClass> results = raw_df.as(myClassEncoder);
              List<MyClass> lst = results.collectAsList();
      
      

      I do several calculations to get the result I want and the result is correct all through the way before I collect it.
      This is the result for -

      results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
      
      
      data[2017-07-01].startTime data[2017-07-01].endTime

      -----------------------------------------+

      1498854000 1498870800

      This is the result after collecting the reuslts for -

      MyClass userData = results.collectAsList().get(0);
      MyDTO userDTO = userData.getData().get("2017-07-01");
      System.out.println("userDTO startTime: " + userDTO.getStartTime());
      System.out.println("userDTO endTime: " + userDTO.getEndTime());
      
      


      data startTime: 1498870800
      data endTime: 1498854000

      I tend to believe it is a spark issue. Would love any suggestions on how to bypass it.

      Attachments

        Activity

          People

            vofque Vladimir Kuriatkov
            tomron Tom
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: