Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20811

GBT Classifier failed with mysterious StackOverflowError

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.1.0
    • None
    • ML

    Description

      I am running GBT Classifier over airline dataset (combining 2005-2008) and in total it's around 22M examples as training data

      code is simple

      Bar.scala
      val gradientBoostedTrees = new GBTClassifier()
            gradientBoostedTrees.setMaxBins(1000)
            gradientBoostedTrees.setMaxIter(500)
            gradientBoostedTrees.setMaxDepth(6)
            gradientBoostedTrees.setStepSize(1.0)
            transformedTrainingSet.cache().foreach(_ => Unit)
            val startTime = System.nanoTime()
            val model = gradientBoostedTrees.fit(transformedTrainingSet)
            println(s"===training time cost: ${(System.nanoTime() - startTime) / 1000.0 / 1000.0} ms")
            val resultDF = model.transform(transformedTestset)
            val binaryClassificationEvaluator = new BinaryClassificationEvaluator()
            binaryClassificationEvaluator.setRawPredictionCol("prediction").setLabelCol("label")
            println(s"=====test AUC: ${binaryClassificationEvaluator.evaluate(resultDF)}======")
      

      my training job always failed with

      17/05/19 13:41:29 WARN TaskSetManager: Lost task 18.0 in stage 3907.0 (TID 137506, 10.0.0.13, executor 3): java.lang.StackOverflowError
      at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:3037)
      at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:3061)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2234)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
      at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479)
      at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)

      the above pattern repeated for many times

      Is it a bug or did I make something wrong when using GBTClassifier in ML?

      Attachments

        Activity

          People

            Unassigned Unassigned
            codingcat Nan Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: