Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29205

Pyspark tests failed for suspected performance problem on ARM

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.0
    • None
    • PySpark
    • None
    • OS: Ubuntu16.04

      Arch: aarch64

      Host: Virtual Machine

    Description

      We test the pyspark on ARM VM. But found some test fails, once we change the source code to extend the wait time for making sure those test tasks had finished, then the test will pass.

       

      The affected test cases including:

      pyspark.mllib.tests.test_streaming_algorithms:StreamingLinearRegressionWithTests.test_parameter_convergence

      pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_convergence

      pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy

      pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_training_and_prediction

      The error message about above test fails:
      ======================================================================
      FAIL: test_parameter_convergence (pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
      Test that the model parameters improve with streaming data.
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 429, in test_parameter_convergen ce
          self._eventually(condition, catch_assertions=True)
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 74, in _eventually
          raise lastValue
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 65, in _eventually
          lastValue = condition()
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 425, in condition
          self.assertEqual(len(model_weights), len(batches))
      AssertionError: 6 != 10
       
       
      ======================================================================
      FAIL: test_convergence (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 292, in test_convergence
          self._eventually(condition, 60.0, catch_assertions=True)
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 74, in _eventually
          raise lastValue
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 65, in _eventually
          lastValue = condition()
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 288, in condition
          self.assertEqual(len(models), len(input_batches))
      AssertionError: 19 != 20
       
      ======================================================================
      FAIL: test_parameter_accuracy (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 266, in test_parameter_accuracy
          self._eventually(condition, catch_assertions=True)
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 74, in _eventually
          raise lastValue
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 65, in _eventually
          lastValue = condition()
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 263, in condition
          self.assertAlmostEqual(rel, 0.1, 1)
      AssertionError: 0.21309223935797794 != 0.1 within 1 places
       
      ======================================================================
      FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
      Test that the model improves on toy data with no. of batches
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_predic tion
          self._eventually(condition, timeout=60.0)
        File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 78, in _eventually
          % (timeout, lastValue))
      AssertionError: Test failed due to timeout after 60 sec, with last condition returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0. 75, 0.74, 0.73, 0.69, 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74, 0.76, 0.78, 0.7, 0.78, 0.8
       
       
      Is it possible to expand the job time to make sure the job run finish if the test result is not change?? Any help or advice is welcome. Thanks

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bzhaoopenstack zhao bo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: