Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12350

VectorAssembler#transform() initially throws an exception

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: Spark Core, Spark Shell
    • Labels:
      None
    • Environment:

      sparkShell command from sbt

      Description

      Calling VectorAssembler.transform() initially throws an exception, subsequent calls work.

      Steps to reproduce

      In spark-shell,
      1. Create a dummy dataframe and define an assembler

      import org.apache.spark.ml.feature.VectorAssembler
      val df = sc.parallelize(List((1,2), (3,4))).toDF
      val assembler = new VectorAssembler().setInputCols(Array("_1", "_2")).setOutputCol("features")
      

      2. Run

      assembler.transform(df).show
      

      Initially the following exception is thrown:

      15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request from /9.72.139.102:60610
      java.lang.IllegalArgumentException: requirement failed: File not found: /classes/org/apache/spark/sql/catalyst/expressions/Object.class
      	at scala.Predef$.require(Predef.scala:233)
      	at org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60)
      	at org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136)
      	at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
      	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
      	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
      	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
      	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
      	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
      	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
      	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
      	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      	at java.lang.Thread.run(Thread.java:745)
      

      Subsequent calls work:

      +---+---+---------+
      | _1| _2| features|
      +---+---+---------+
      |  1|  2|[1.0,2.0]|
      |  3|  4|[3.0,4.0]|
      +---+---+---------+
      

      It seems as though there is some internal state that is not initialized.

      Imran Younus originally found this issue.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vanzin Marcelo Masiero Vanzin
                Reporter:
                jodersky Jakob Odersky
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: