Description
when I map my own feature calculation module's function, sparks raises:
Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", line 162, in manager
code = worker(sock)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", line 60, in worker
worker_main(infile, outfile)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 115, in main
report_times(outfile, boot_time, init_time, finish_time)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 40, in report_times
write_long(1000 * boot, outfile)
File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/serializers.py", line 518, in write_long
stream.write(struct.pack("!q", value))
DeprecationWarning: integer argument expected, got float
so I turn on the serializers.py, and tried to print the value out, which is a float, came from 1000 * time.time()
when I removed my lib, or add a rdd.count() before mapping my lib, this bug won’t appear.
so I edited the function to :
def write_long(value, stream):
stream.write(struct.pack("!q", int(value))) # added a int(value)
everything seem fine…
According to python’s doc for struct(https://docs.python.org/2/library/struct.html)’s Note(3), the value should be a int(for q), and if it’s a float, it’ll try use _index(), else, try __int, but since __int_ is deprecated, it’ll raise DeprecationWarning. And float doesn’t have _index, but has __int_, so it should raise the exception every time.
But, as you can see, in normal cases, it won’t raise the exception, and the code works perfectly, and exec struct.pack('!q', 111.1) in console or a clean file won't raise any exception…I can hardly tell how my lib might effect a time.time()'s value passed to struct.pack()... it might a python's original bug or what.
Anyway, this value should be a int, so add a int() to it.