Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
2.2.0
-
None
-
HDP 2.6.0.3
Spark2 2.2.0
HDFS 2.7.3
CentOS 7.1
-
Important
Description
Hello,
It is impossible to run Spark on the BLOB storage file system deployed on HDP.
I am unable to run Spark as it is giving errors related to HiveSessionState, HiveExternalCatalog and various Azure File storage exceptions.
I request you to kindly help in case you have a suggestion to address this. Or is it that my exercise is futile and Spark is not configured to run on BLOB storage after all.
Thanks in advance.
Detailed Description:
We are unable to access spark/spark2 when we change the file system storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop 2.7.3. All other services are working fine.
I have set the following configurations:
HDFS:
core-site-
fs.defaultFS = wasb://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net
fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb
fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs
fs.azure.selfthrottling.read.factor = 1.0
fs.azure.selfthrottling.write.factor = 1.0
fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net = KEY
spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net = KEY
SPARK2:
spark.eventLog.dir = wasb://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net/spark2-history/
spark.history.fs.logDirectory = wasb://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net/spark2-history/
In spite of trying multiple times and irrespective of alternative configurations, the spark-shell command is yielding the below results:
$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
... 47 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980)
... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:176)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
... 63 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:173)
... 71 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65)
... 76 more
Caused by: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
... 84 more
Caused by: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2027)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2081)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.conditionalRedoFolderRename(NativeAzureFileSystem.java:2137)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2104)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 85 more
Caused by: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113)
at org.apache.hadoop.fs.azure.StorageInterfaceImpl$WrappingIterator.hasNext(StorageInterfaceImpl.java:130)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2006)
... 93 more
Caused by: com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: OK
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:101)
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:199)
at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)
... 95 more
Caused by: java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration
at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParser(Unknown Source)
at com.microsoft.azure.storage.core.Utility.getSAXParser(Utility.java:668)
at com.microsoft.azure.storage.blob.BlobListHandler.getBlobList(BlobListHandler.java:72)
at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1284)
at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1248)
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:146)
... 96 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
It would be immensely helpful if anyone could assist in resolving the above. It may happen that we have missed out on configuring an important aspect of HDFS or Spark, as a result of which it is unable to locate certain JARS and is getting incompatible with the BLOB storage.
Kindly assist !
PS: I have made sure the required jars of azure-storage and Hadoop-azure are made available in the spark and the Hadoop lib folders. I have even tried to specify the same explicitly when starting spark-shell but to no effect.