Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6931

python: struct.pack('!q', value) in write_long(value, stream) in serializers.py require int(but doesn't raise exceptions in common cases)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.3.0
    • 1.2.3, 1.3.2
    • PySpark
    • Patch

    Description

      when I map my own feature calculation module's function, sparks raises:
      Traceback (most recent call last):
      File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", line 162, in manager
      code = worker(sock)
      File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", line 60, in worker
      worker_main(infile, outfile)
      File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 115, in main
      report_times(outfile, boot_time, init_time, finish_time)
      File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 40, in report_times
      write_long(1000 * boot, outfile)
      File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/serializers.py", line 518, in write_long
      stream.write(struct.pack("!q", value))
      DeprecationWarning: integer argument expected, got float

      so I turn on the serializers.py, and tried to print the value out, which is a float, came from 1000 * time.time()

      when I removed my lib, or add a rdd.count() before mapping my lib, this bug won’t appear.

      so I edited the function to :

      def write_long(value, stream):
      stream.write(struct.pack("!q", int(value))) # added a int(value)

      everything seem fine…

      According to python’s doc for struct(https://docs.python.org/2/library/struct.html)’s Note(3), the value should be a int(for q), and if it’s a float, it’ll try use _index(), else, try __int, but since __int_ is deprecated, it’ll raise DeprecationWarning. And float doesn’t have _index, but has __int_, so it should raise the exception every time.

      But, as you can see, in normal cases, it won’t raise the exception, and the code works perfectly, and exec struct.pack('!q', 111.1) in console or a clean file won't raise any exception…I can hardly tell how my lib might effect a time.time()'s value passed to struct.pack()... it might a python's original bug or what.

      Anyway, this value should be a int, so add a int() to it.

      Attachments

        Activity

          People

            bryanc Bryan Cutler
            zhangcx93 Chunxi Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified