Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13299

DataFrame limit operation is not consistent

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.3.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
    • Fix Version/s: None
    • Component/s: None

      Description

      I faced to a problem with using limit method from DataFrame API.
      I try to get first 999 records from the AVRO source which contains about 3.5K records.

      DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");
      
      df = df.limit(999);
      

      Then after saving operation I get the rows not in the same order as in input data set. Sometimes it gives me proper order but usually not.

      df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
      

      Here you can see Spark plan (maybe it can help to figure out the cause of the issue):

      == Parsed Logical Plan ==
      Limit 999
       Relation[mobileNumber#0L,tariff#1,debit#2] AvroRelation(hdfs://<server_name>:8020/user/hdfs/dataset.avro,None,0)
      
      == Analyzed Logical Plan ==
      mobileNumber: bigint, tariff: string, debit: float
      Limit 999
       Relation[mobileNumber#0L,tariff#1,debit#2] AvroRelation(hdfs://<server_name>:8020/user/hdfs/dataset.avro,None,0)
      
      == Optimized Logical Plan ==
      Limit 999
       Relation[mobileNumber#0L,tariff#1,debit#2] AvroRelation(hdfs://<server_name>:8020/user/hdfs/dataset.avro,None,0)
      
      == Physical Plan ==
      Limit 999
       Scan AvroRelation(hdfs://<server_name>:8020/user/hdfs/dataset.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]
      
      Code Generation: true
      

        Attachments

        1. SparkLimitIssue.png
          117 kB
          Nazarii Balkovskyi

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              nazarii.balkovskii Nazarii Balkovskyi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: