[SPARK-33064] Spark-shell does not display accented chara - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Cannot Reproduce
Affects Version/s: 3.0.1
Fix Version/s: None
Component/s: Spark Shell
Labels:
None
Environment:

Windows 10

"Beta: Use Unicode UTF-8 for worldwide language support" has been checked.

Docs Text:

Hide
Café
CafÃ©

+-----+
| _c0|
+-----+
| Caf|
|Café|
+-----+

Show
Café CafÃ© +-----+ | _c0| +-----+ | Caf| |Café| +-----+

Description

It seems to be a duplicate of ~~FLEX-18425~~, which is duplicate of SDK-17398 that does not exist anymore. But the bug remains.

(1) I create a txt file "café.txt" that contains two lines :

Café

CafÃ©

(2) I type the following command :

spark.read.csv("café.txt").show()

It is displayed as following :

spark.read.csv("caf.txt").show()

But it works and it returns this :

-----
| _c0|
-----
| Caf|
|Café|
-----

We notice a shift after "Caf" and "Café".

(3) The two following commands works. The written textfiles have the same content as "café.txt"

spark.read.csv("café.txt").write.format("text").save("café2")

sc.textFile("café.txt").saveAsTextFile("café3")

Once again, the Spark-shell display this :

spark.read.csv("caf.txt").write.format("text").save("caf2")

sc.textFile("caf.txt").saveAsTextFile("caf3")

(4)If I type 7 "é" an then 7 Backspace, by using the "é" key of my french keyboard, then the scala prompt disappears. I have a new prompt when I type Return.

The issue (4) as well as the shift in (2) seem to be related to the difference between counted characters and displayed characters.

(5) I notice that I haven't got this issue by launching Spark from Ubuntu, thanks to "Windows Subsystem for Linux" Version 2.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Laurent GUEMAPPE

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Oct/20 10:34

Updated:: 12/Dec/22 18:10

Resolved:: 05/Oct/20 03:42