[SPARK-15347] Problem select empty ORC table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.6.1
Fix Version/s: None
Component/s: PySpark
Labels:
None
Environment:

Hide

Hadoop 2.7.1.2.4.2.0-258
Subversion git@github.com:hortonworks/hadoop.git -r 13debf893a605e8a88df18a7d8d214f571e05289
Compiled by jenkins on 2016-04-25T05:46Z
Compiled with protoc 2.5.0
From source with checksum 2a2d95f05ec6c3ac547ed58cab713ac
This command was run using /usr/hdp/2.4.2.0-258/hadoop/hadoop-common-2.7.1.2.4.2.0-258.jar

Show
Hadoop 2.7.1.2.4.2.0-258 Subversion git@github.com:hortonworks/hadoop.git -r 13debf893a605e8a88df18a7d8d214f571e05289 Compiled by jenkins on 2016-04-25T05:46Z Compiled with protoc 2.5.0 From source with checksum 2a2d95f05ec6c3ac547ed58cab713ac This command was run using /usr/hdp/2.4.2.0-258/hadoop/hadoop-common-2.7.1.2.4.2.0-258.jar

Description

Error when I selected empty ORC table

[pprado@hadoop-m ~]$ beeline -u jdbc:hive2://
WARNING: Use "yarn jar" to launch YARN applications.
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive

On beeline => create table my_test (id int, name String) stored as orc;
On beeline => select * from my_test;

16/05/13 18:18:57 [main]: ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
OK
--------------------------

my_test.id

my_test.name

--------------------------
--------------------------
No rows selected (1.227 seconds)

Hive is OK!

Now, when i execute pyspark.

Welcome to
SPARK version 1.6.1

Using Python version 2.6.6 (r266:84292, Jul 23 2015 15:22:56)
SparkContext available as sc, HiveContext available as sqlContext.

PySpark => sqlContext.sql("select * from my_test")

16/05/13 18:33:41 INFO ParseDriver: Parsing command: select * from my_test
16/05/13 18:33:41 INFO ParseDriver: Parse Completed
Traceback (most recent call last):
File "", line 1, in
File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/context.py", line 580, in sql
return DataFrame(self.ssql_ctx.sql(sqlQuery), self)
File "/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in _call
File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/utils.py", line 53, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'orcFileOperator: path hdfs://hadoop-m.c.sva-0001.internal:8020/apps/hive/warehouse/my_test does not have valid orc files matching the pattern'

when i create parquet table, it's all right. I do not have problem.

Attachments

Issue Links

blocks

SPARK-20901 Feature parity for ORC with Parquet

Open

duplicates

SPARK-14286 Empty ORC table join throws exception

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Pedro Prado

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/May/16 14:50

Updated:: 26/May/17 17:56

Resolved:: 16/May/16 15:05