[SPARK-20112] SIGSEGV in GeneratedIterator.sort_addToSorter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.0.2
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed
Environment:

AWS m4.10xlarge with EBS (io1 drive, 400g, 4000iops)

Description

I'm seeing a very weird crash in GeneratedIterator.sort_addToSorter. The hs_err_pid and codegen file are attached (with query plans). Its not a deterministic repro, but running a big query load, I eventually see it come up within a few minutes.

Here is some interesting repro information:

Using AWS r3.8xlarge machines, which have ephermal attached drives, I can't repro this. But it does repro with m4.10xlarge with an io1 EBS drive. So I think that means its not an issue with the code-gen, but I cant figure out what the difference in behavior is.
The broadcast joins in the plan are all small tables. I have autoJoinBroadcast=-1 because I always hint which tables should be broadcast.
As you can see from the plan, all the sources are cached memory tables. And we partition/sort them all beforehand so its always sort-merge-joins or broadcast joins (with small tables).

# A fatal error has been detected by the Java Runtime Environment:
#
#  [thread 139872345896704 also had an error]
SIGSEGV (0xb) at pc=0x00007f38a378caa3, pid=19271, tid=139872342738688
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)

[thread 139872348002048 also had an error]# Problematic frame:
# 
J 28454 C1 org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;)V (369 bytes) @ 0x00007f38a378caa3 [0x00007f38a378b5e0+0x14c3]

This kind of looks like https://issues.apache.org/jira/browse/SPARK-15822, but that is marked fix in 2.0.0

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

codegen_sorter_crash.log
27/Mar/17 21:45
64 kB
Mitesh
hs_err_pid19271.log
27/Mar/17 21:26
436 kB
Mitesh
hs_err_pid22870.log
28/Mar/17 15:39
439 kB
Mitesh

Issue Links

is related to

SPARK-15822 segmentation violation in o.a.s.unsafe.types.UTF8String

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Mitesh

Votes:: 5 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 27/Mar/17 21:26

Updated:: 21/May/19 04:11

Resolved:: 21/May/19 04:11