[SPARK-24291] Data source table is not displaying records when files are uploaded to table location - ASF JIRA

XML

Word

Printable

JSON

Precondition:

1.Already one .orc file exists in the /tmp/orcdata/ location

Launch Spark-sql
spark-sql> CREATE TABLE os_orc (name string, version string, other string) USING ORC OPTIONS (path '/tmp/orcdata/');
spark-sql> select * from os_orc;
Spark 2.3.0 Apache
Time taken: 2.538 seconds, Fetched 1 row(s)
pc1:/opt/# ./hadoop dfs -ls /tmp/orcdata
Found 1 items
~~rw-r~~r- 3 spark hadoop 475 2018-05-09 18:21 /tmp/orcdata/part-00000-d488121b-e9fd-4269-a6ea-842c631722ee-c000.snappy.orc
pc1:/opt/# ./hadoop fs -copyFromLocal /opt/OS/loaddata/orcdata/part-00001-d488121b-e9fd-4269-a6ea-842c631722ee-c000.snappy.orc /tmp/orcdata/data2.orc
pc1:/opt/# ./hadoop dfs -ls /tmp/orcdata
Found 2 items
~~rw-r~~r- 3 spark hadoop 475 2018-05-15 14:59 /tmp/orcdata/data2.orc
~~rw-r~~r- 3 spark hadoop 475 2018-05-09 18:21 /tmp/orcdata/part-00000-d488121b-e9fd-4269-a6ea-842c631722ee-c000.snappy.orc
pc1:/opt/# **

5. Again execute the select command on the table os_orc

spark-sql> select * from os_orc;
Spark 2.3.0 Apache
Time taken: 1.528 seconds, Fetched 1 row(s)

Actual Result: On executing select command it does not display the all the records exist in the data source table location

Expected Result: All the records should be fetched and displayed for the data source table from the location

NB:

1.On exiting and relaunching the spark-sql session, select command fetches the correct # of records.

2.This issue is valid for all the data source tables created with 'Using' .

I came across this use case in Spark 2.2.1 when tried to reproduce a customer site observation.