Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26496

Avoid to use Random.nextString in StreamingInnerJoinSuite

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 2.3.3, 2.4.1, 3.0.0
    • Structured Streaming, Tests
    • None
    • Mac OS X High Sierra

    Description

      This is a bit esoteric and minor, but makes it difficult to run SQL unit tests successfully on High Sierra.

      StreamingInnerJoinSuite."locality preferences of StateStoreAwareZippedRDD" generates a directory name using Random.nextString(10), and frequently that directory name is unacceptable to High Sierra.

      For example:

      scala> val prefix = Random.nextString(10); val dir = new File("/tmp", "del_" + prefix + "-" + UUID.randomUUID.toString); dir.mkdirs()
      prefix: String = 媈ᒢ탊渓뀟?녛ꃲ싢櫦
      dir: java.io.File = /tmp/del_媈ᒢ탊渓뀟?녛ꃲ싢櫦-aff57fc6-ca38-4825-b4f3-473140edd4f6
      res39: Boolean = true // this one was OK
      
      scala> val prefix = Random.nextString(10); val dir = new File("/tmp", "del_" + prefix + "-" + UUID.randomUUID.toString); dir.mkdirs()
      prefix: String = 窽텘⒘駖ⵚ駢⡞Ρ닋੎
      dir: java.io.File = /tmp/del_窽텘⒘駖ⵚ駢⡞Ρ닋੎-a3f99855-c429-47a0-a108-47bca6905745
      res40: Boolean = false  // nope, didn't like this one
      
      scala> prefix.foreach(x => printf("%04x ", x.toInt))
      7abd d158 2498 99d6 2d5a 99e2 285e 03a1 b2cb 0a4e 
      
      scala> prefix(9)
      res46: Char = ੎
      
      scala> val prefix = "\u7abd"
      prefix: String = 窽
      
      scala> val dir = new File("/tmp", "del_" + prefix + "-" + UUID.randomUUID.toString); dir.mkdirs()
      dir: java.io.File = /tmp/del_窽-d1c3af34-d34d-43fe-afed-ccef9a800ff4
      res47: Boolean = true // it's OK with \u7abd
      
      scala> val prefix = "\u0a4e"
      prefix: String = ੎
      
      scala> val dir = new File("/tmp", "del_" + prefix + "-" + UUID.randomUUID.toString); dir.mkdirs()
      dir: java.io.File = /tmp/del_੎-3654a34c-6f74-4591-85af-a0f28b675a6f
      res50: Boolean = false // doesn't like \u0a4e
      

      I thought it might have something to do with my Java 8 version, but Python is equally affected:

      >>> f = open(u"/tmp/del_\u7abd_file", "wb")
      f = open(u"/tmp/del_\u7abd_file", "wb")
      >>> f.write("hello\n")
      f.write("hello\n")
      # it's OK with \u7abd
      >>> f2 = open(u"/tmp/del_\u0a4e_file", "wb")
      f2 = open(u"/tmp/del_\u0a4e_file", "wb")
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      IOError: [Errno 92] Illegal byte sequence: u'/tmp/del_\u0a4e_file'
      # doesn't like \u0a4e
      >>> f2 = open(u"/tmp/del_\ufa4e_file", "wb")
      f2 = open(u"/tmp/del_\ufa4e_file", "wb")
      # a little change and it's happy again
      >>> 
      

      Mac OS X Sierra is perfectly happy with these characters. This seems to be a limitation introduced by High Sierra.

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              bersprockets Bruce Robbins
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: