Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
2.4.4
-
None
-
None
-
AWS EMR 5.27.0, Spark 2.4.4
Description
Running a union operation on two DataFrames through both Scala Spark Shell and PySpark, resulting in executor contains doing a core dump and existing with Exit code 134.
The trace from the Driver:
Container exited with a non-zero exit code 134 . 19/11/06 02:21:35 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ip-172-30-6-79.ec2.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container from a bad node: container_1572981097605_0021_01_000077 on host: ip-172-30-6-79.ec2.internal. Exit status: 134. Diagnostics: Exception from container-launch. Container id: container_1572981097605_0021_01_000077 Exit code: 134 Exception message: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077/stderrStack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 12611 Aborted LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native::/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native" /usr/lib/jvm/java-openjdk/bin/java -server -Xmx2743m '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' -Djava.io.tmpdir=/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/tmp '-Dspark.history.ui.port=18080' '-Dspark.driver.port=42267' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@ip-172-30-6-103.ec2.internal:42267 --executor-id 11 --hostname ip-172-30-6-79.ec2.internal --cores 2 --app-id application_1572981097605_0021 --user-class-path file:/mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/__app__.jar > /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077/stdout 2> /var/log/hadoop-yarn/containers/application_1572981097605_0021/container_1572981097605_0021_01_000077/stderr at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 134
From the stdout logs of the exiting container we see:
# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f825e3b0e92, pid=12611, tid=0x00007f822b5fb700 # # JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-b10) # Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0xa9ae92] # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /mnt1/yarn/usercache/hadoop/appcache/application_1572981097605_0021/container_1572981097605_0021_01_000077/hs_err_pid12611.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp #
Also, I am unable to enable core dump even though ulimit -c is set to unlimited. Can you help on how to go about this issue, and also how to get the core dump ?
Steps to reproduce the issue:
- Upload the attached parquet data file to S3 s3://<bucket>/tables/spark_29767_parquet_table/inserted_at=201910/
- Create a partitioned hive table
CREATE EXTERNAL TABLE `spark_29767_parquet_table`( `hour` bigint, `title` string, `__deleted` string, `status` string, `transformationid` string, `roomid` string, `day` bigint, `notes` string, `nunitsfromaudit` bigint, `ts_ms` bigint, `liability` string, `_class` string, `month` bigint, `updatedate` struct<`date`:bigint>, `_id` struct<oid:string>, `year` bigint, `item` struct<name:string,brandname:string,perunitpricefromaudit:struct<currency:string,amount:string>,actualPerUnitPrice:struct<currency:string,amount:string>,category:string,itemType:string,roomAmenityId:bigint>, `createddate` struct<`date`:bigint>, `actualunits` bigint, `description` string) PARTITIONED BY ( `inserted_at` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://<bucket>/tables/spark_29767_parquet_table'
- Sync partition
ALTER TABLE spark_29767_parquet_table ADD PARTITION (inserted_at='201910') location 's3://<bucket>/tables/spark_29767_parquet_table/inserted_at=201910/'
- In pyspark run the following:
// Read the base data frame from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession, HiveContext from pyspark.sql.functions import lit sparkSession = (SparkSession .builder .appName('example-pyspark-read-and-write-from-hive') .enableHiveSupport() .getOrCreate())base_df = sparkSession.table("spark_29767_parquet_table") base_df = sparkSession.table("spark_29767_parquet_table") base_df = base_df.select("_id", "_class", "roomid", "item", "inserted_at") // Create a new dataframe with one row for union from pyspark.sql import * import pyspark.sql.types from pyspark.sql.types import * schema = StructType([ StructField("_id",StructType([StructField("oid",StringType(),True)]),True), StructField("_class",StringType(),True), StructField("roomid",StringType(),True), StructField("item",StructType([ StructField("name",StringType(),True), StructField("brandname",StringType(),True), StructField("perunitpricefromaudit", StructType([ StructField("currency",StringType(),True), StructField("amount",StringType(),True)]),True), StructField("actualperunitprice",StructType([ StructField("currency",StringType(),True), StructField("amount",StringType(),True)]),True), StructField("category",StringType(),True), StructField("itemtype",StringType(),True), StructField("roomamenityid",LongType(),True)]),True), StructField("inserted_at",StringType(),True)]) data = [ Row(Row("5daff5ca43b8a36756c23b0f"), "com.oyo.transformations.tasks.model.implementations.AuditItemTaskImpl", None, Row("Geyser Installation(with accessories)",None,Row("INR", "425.0"),None,"INFRASTRUCTURE","PMC",None), "201910" ) ] inc_df = spark.createDataFrame( spark.sparkContext.parallelize(data), schema ) inc_df.union(base_df).show()