Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14477

No enum constant Operation.GET_BLOCK_LOCATIONS

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.0, 2.8.0, 2.7.1, 2.7.2, 2.7.3, 2.9.0, 2.7.4, 2.8.1, 2.8.2, 2.8.3, 2.7.5, 3.0.0, 2.9.1, 2.8.4, 2.7.6, 2.9.2, 2.8.5, 2.7.7, 2.7.8, 2.8.6
    • Fix Version/s: None
    • Component/s: fs
    • Labels:
      None
    • Environment:

      Running on Ubuntu 16.04

      Hadoop v2.7.4

      Minikube v1.0.1

      Scala v2.11

      Spark v2.4.2

       

    • Tags:
      httpfs
    • Flags:
      Important

      Description

      I was trying to read Avro files contents from HDFS using Spark application and Httpfs configured in minikube (for using Kubernetes locally). Each time I try to read the files I get this exception:

      Exception in thread "main" org.apache.hadoop.ipc.RemoteException(com.sun.jersey.api.ParamException$QueryParamException): java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.GET_BLOCK_LOCATIONS
       at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:118)
       at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:367)
       at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:98)
       at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:625)
       at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:472)
       at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:502)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
       at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:498)
       at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1420)
       at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1404)
       at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:343)
       at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
       at scala.Option.getOrElse(Option.scala:121)
       at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
       at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
       at scala.Option.getOrElse(Option.scala:121)
       at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
       at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
       at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
       at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
       at spark_test.TestSparkJob$.main(TestSparkJob.scala:48)
       at spark_test.TestSparkJob.main(TestSparkJob.scala)

       

      I access HDFS using Httpfs setup in Kubernetes. So my Spark application runs outside of the K8s cluster therefore, all the services are accessed using NodePorts. When I launch the Spark app inside of the K8s cluster and use only HDFS client or WebHDFS, I can get all the files contents. The error occurs only when I execute an app outside of the cluster and that is when I access HDFS using Httpfs.

      So I checked Hadoop sources and I have found out that there is no such enum as GET_BLOCK_LOCATIONS. It is named GETFILEBLOCKLOCATIONS in Operation enum by [this link|https://github.com/apache/hadoop/blob/release-2.7.4-RC0/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java]. And the same applies to all the Hadoop versions I have checked (2.7.4 and higher). 

      The conclusion would be that HDFS and HttpFs are not compatible with operations names. But it may be true for other operations. So It is not yet possible to read the data from HDFS using Httpfs. 
      Is it possible to fix this error somehow?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                roksolana-d Roksolana Diachuk
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: