Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Cannot Reproduce
-
3.0.1
-
None
-
None
-
Windows 10
"Beta: Use Unicode UTF-8 for worldwide language support" has been checked.
Description
It seems to be a duplicate of FLEX-18425, which is duplicate of SDK-17398 that does not exist anymore. But the bug remains.
(1) I create a txt file "café.txt" that contains two lines :
Café
Café
(2) I type the following command :
spark.read.csv("café.txt").show()
It is displayed as following :
spark.read.csv("caf.txt").show()
But it works and it returns this :
-----
| _c0|
-----
| Caf|
|Café|
-----
We notice a shift after "Caf" and "Café".
(3) The two following commands works. The written textfiles have the same content as "café.txt"
spark.read.csv("café.txt").write.format("text").save("café2")
sc.textFile("café.txt").saveAsTextFile("café3")
Once again, the Spark-shell display this :
spark.read.csv("caf.txt").write.format("text").save("caf2")
sc.textFile("caf.txt").saveAsTextFile("caf3")
(4)If I type 7 "é" an then 7 Backspace, by using the "é" key of my french keyboard, then the scala prompt disappears. I have a new prompt when I type Return.
The issue (4) as well as the shift in (2) seem to be related to the difference between counted characters and displayed characters.
(5) I notice that I haven't got this issue by launching Spark from Ubuntu, thanks to "Windows Subsystem for Linux" Version 2.