[SPARK-8842] Spark SQL - Insert into table Issue - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.4.0, 2.0.1
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

I am running spark 1.4 and currently experiencing an issue when inserting data into a table. The data is loaded into an initial table and then selected from this table, processed and then inserted into a second table. The issue is that some of the data goes missing when inserted into the second table when running in a multi-worker configuration (a master, a worker on the master and then a worker on a different host).

I have narrowed down the problem to the insert into the second table. An example process to generate the problem is below.

Generate a file (for example /home/spark/test) with the numbers 1 to 50 on separate lines.

spark-sql --master spark://spark-master:7077 --hiveconf hive.metastore.warehouse.dir=/spark
(/spark is shared between all hosts)

create table test(field string);
load data inpath '/home/spark/test' into table test;
create table processed(field string);
from test insert into table processed select *;
select * from processed;

The result from the final select does not contain all the numbers 1 to 50.

I have also run the above example in some different configurations :-

When there is just one worker running on the master. The result of the final select is the rows 1-50 i.e all data as expected.
When there is just one worker running on a host which is not the master. The final select returns no rows.

No errors are logged in the log files.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: James Greenwood

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Jul/15 14:28

Updated:: 12/Dec/22 18:10

Resolved:: 21/May/19 04:38