Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.0
-
None
-
OS: SUSE11
Spark Version: 2.3
Description
Precondition:
1.Already one .orc file exists in the /tmp/orcdata/ location
- Launch Spark-sql
- spark-sql> CREATE TABLE os_orc (name string, version string, other string) USING ORC OPTIONS (path '/tmp/orcdata/');
- spark-sql> select * from os_orc;
Spark 2.3.0 Apache
Time taken: 2.538 seconds, Fetched 1 row(s) - pc1:/opt/# ./hadoop dfs -ls /tmp/orcdata
Found 1 items
rw-rr- 3 spark hadoop 475 2018-05-09 18:21 /tmp/orcdata/part-00000-d488121b-e9fd-4269-a6ea-842c631722ee-c000.snappy.orc
pc1:/opt/# ./hadoop fs -copyFromLocal /opt/OS/loaddata/orcdata/part-00001-d488121b-e9fd-4269-a6ea-842c631722ee-c000.snappy.orc /tmp/orcdata/data2.orc
pc1:/opt/# ./hadoop dfs -ls /tmp/orcdata
Found 2 items
rw-rr- 3 spark hadoop 475 2018-05-15 14:59 /tmp/orcdata/data2.orc
rw-rr- 3 spark hadoop 475 2018-05-09 18:21 /tmp/orcdata/part-00000-d488121b-e9fd-4269-a6ea-842c631722ee-c000.snappy.orc
pc1:/opt/# **
5. Again execute the select command on the table os_orc
spark-sql> select * from os_orc;
Spark 2.3.0 Apache
Time taken: 1.528 seconds, Fetched 1 row(s)
Actual Result: On executing select command it does not display the all the records exist in the data source table location
Expected Result: All the records should be fetched and displayed for the data source table from the location
NB:
1.On exiting and relaunching the spark-sql session, select command fetches the correct # of records.
2.This issue is valid for all the data source tables created with 'Using' .
I came across this use case in Spark 2.2.1 when tried to reproduce a customer site observation.