[SPARK-25520] The state of executors is KILLED on standalone - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.2, 2.3.1
Fix Version/s: None
Component/s: Web UI
Labels:
- bulk-closed
Environment:

spark 2.3.1

spark 2.2.2

Java 1.8.0_131

scala 2.11.8

Description

I create spark standalone cluster (4 servers) by using spark 2.3.1. The job can be finished on Completed Drivers. I also can get the result by driver log. But the status of all executors show KILLED state. The log show the following error.

2018-09-25 00:47:37 INFO CoarseGrainedExecutorBackend:54 - Driver commanded a shutdown

2018-09-25 00:47:37 ERROR CoarseGrainedExecutorBackend:43 - RECEIVED SIGNAL TERM utdown

I also try spark 2.2.2. I see the same issues on the GUI. All executors are KILLED status.

Is it right? what is the problem?

-------------------------------------------- Config ----------------------------------------------------------------

spark-env.sh:

export SPARK_PUBLIC_DNS=hostname1
export SCALA_HOME=/opt/gpf/bigdata/scala-2.11.8
export JAVA_HOME=/usr/java/jdk1.8.0_131
export HADOOP_HOME=/opt/bigdata/hadoop-2.6.5
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=file:///spark/spark-event-dir -Dspark.history.ui.port=16066 -Dspark.history.retainedApplications=30 -Dspark.history.fs.cleaner.enabled=true -Dspark.history.fs.cleaner.interval=1d -Dspark.history.fs.cleaner.maxAge=7d"
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hostname1:2181,hostname2:2181,hostname3:2181 -Dspark.deploy.zookeeper.dir=/opt/bigdata/spark-2.3.1/zk-recovery-dir"

SPARK_LOCAL_DIRS=/opt/bigdata/spark-2.3.1/local-dir
SPARK_DRIVER_MEMORY=1G

spark-defaults.conf:

spark.eventLog.enabled true
spark.eventLog.compress true
spark.eventLog.dir file:///spark/spark-event-dir

slaves:

hostname1

hostname2

hostname3

hostname4

------------------------------------------------Testing ------------------------------------------------------------

Testing program1 (Java):

public class JDBCApp {

private static final String DB_OLAP_UAT_URL = "jdbc:sqlserver://dbhost";

private static final String DB_DRIVER = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
private static final String SQL_TEXT = "select top 10 * from table1";
private static final String DB_OLAP_UAT_USR = "";
private static final String DB_OLAP_UAT_PWD = "";

public static void main(String[] args)

{ System.setProperty("spark.sql.warehouse.dir","file:///bigdata/spark/spark-warehouse"); // Logger.getLogger("org.apache.spark").setLevel(Level.DEBUG); SparkSession spark = SparkSession .builder() .appName("JDBCApp") .getOrCreate(); Dataset<Row> jdbcDF = spark.read() .format("jdbc") .option("driver", DB_DRIVER) .option("url", DB_OLAP_UAT_URL) .option("dbtable", "(" + SQL_TEXT + ") tmp") .option("user", DB_OLAP_UAT_USR) .option("password", DB_OLAP_UAT_PWD) .load(); jdbcDF.show(); }

}

Testing program2 (Java):

public class SimpleApp {

public static void main(String[] args)

{ String filePath = args[0]; Logger logger = Logger.getLogger("org.apache.spark"); // logger.setLevel(Level.DEBUG); SparkSession spark = SparkSession.builder() .appName("Simple Application") .getOrCreate(); Dataset<String> logData = spark.read().textFile(filePath).cache(); long numAs = logData.filter((FilterFunction<String>) s -> s.contains("e")).count(); long numBs = logData.filter((FilterFunction<String>) s -> s.contains("r")).count(); logger.info("Lines with a: " + numAs + ", lines with b: " + numBs); spark.stop(); }

}

You can run above 2 testing programs, and then you can see the state of executors are KILLED.

---------------------------------------------- CMD ----------------------------------------------------------------------------------------------

./spark-submit \
--driver-class-path /sharedata/mssql-jdbc-6.4.0.jre8.jar \
--jars /sharedata/mssql-jdbc-6.4.0.jre8.jar \
--class JDBCApp \
--master spark://hostname1:6066 \
--deploy-mode cluster \
--driver-memory 2G \
--executor-memory 2G \
--total-executor-cores 8 \
/sharedata/spark-demo-1.0-SNAPSHOT.jar

------------------------------------------------------------------------------------------------------------

Another issue is that I can't see the stdout/stderr of driver on Executors tab on spark history server. I only can see the stdout/stderr of executors. It causes after restart spark standalone cluster, I can't see the result of driver on both spark UI and spark history server.

The state of executors is KILLED on standalone

Details

Description

Attachments

Attachments

Activity

People

Dates