Apache Drill
  1. Apache Drill
  2. DRILL-1290

Document Configuration Steps for Different Hadoop Distributions

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Future
    • Component/s: Client - HTTP
    • Labels:
      None
    • Environment:

      OS: Centos 6.4
      HDFS: CDH5.1
      Drill: 0.4.0

      Description

      In web GUI, I can successfully create a new storage plugin named "myhdfs" using "file:///":

      {
        "type": "file",
        "enabled": true,
        "connection": "file:///",
        "workspaces": {
          "root": {
            "location": "/",
            "writable": false,
            "storageformat": null
          },
          "tmp": {
            "location": "/tmp",
            "writable": true,
            "storageformat": "csv"
          }
        },
        "formats": {
          "psv": {
            "type": "text",
            "extensions": [
              "tbl"
            ],
            "delimiter": "|"
          },
          "csv": {
            "type": "text",
            "extensions": [
              "csv"
            ],
            "delimiter": ","
          },
          "tsv": {
            "type": "text",
            "extensions": [
              "tsv"
            ],
            "delimiter": "\t"
          },
          "parquet": {
            "type": "parquet"
          },
          "json": {
            "type": "json"
          }
        }
      }
      

      However if I try to change "file:///" to "hdfs:///" to point to HDFS other than local file system, drill log errors out "[qtp416200645-67] DEBUG o.a.d.e.server.rest.StorageResources - Unable to create/ update plugin: myhdfs".

      {
        "type": "file",
        "enabled": true,
        "connection": "hdfs:///",
        "workspaces": {
          "root": {
            "location": "/",
            "writable": false,
            "storageformat": null
          },
          "tmp": {
            "location": "/tmp",
            "writable": true,
            "storageformat": "csv"
          }
        },
        "formats": {
          "psv": {
            "type": "text",
            "extensions": [
              "tbl"
            ],
            "delimiter": "|"
          },
          "csv": {
            "type": "text",
            "extensions": [
              "csv"
            ],
            "delimiter": ","
          },
          "tsv": {
            "type": "text",
            "extensions": [
              "tsv"
            ],
            "delimiter": "\t"
          },
          "parquet": {
            "type": "parquet"
          },
          "json": {
            "type": "json"
          }
        }
      }
      

      On my cluster, I am using CDH5 hdfs, and it all client configurations are valid. For example, on the drillbit server:

      [root@hdm ~]# hdfs dfs -ls /
      Found 3 items
      drwxr-xr-x   - hbase hbase               0 2014-08-04 22:55 /hbase
      drwxrwxrwt   - hdfs  supergroup          0 2014-07-31 16:31 /tmp
      drwxr-xr-x   - hdfs  supergroup          0 2014-07-11 12:06 /user
      

      Is there anything wrong with the storage plugin syntax for HDFS?
      If so, can drill log prints more debug info to show the reason why it failed?
      Thanks.

        Issue Links

          Activity

          Hide
          Neeraja added a comment -

          Can you refer to the configuration mentioned in https://issues.apache.org/jira/browse/DRILL-1075 to see if it helps.
          Here is a quick snippet from the bug.
          ----------
          Here’s a sample configuration for HDFS:

          {
          "type" : "file",
          "enabled" : true,
          "connection" : "hdfs://10.10.30.156:8020/",
          “workspaces" : {
          “root :

          { "location" : "/user/root/drill", "writable" : true, "storageformat" : "null" }

          },
          “formats" : {
          “json” :

          { "type" : “json" }

          }
          }

          Make sure these packages are in the class path (changes based on installation):
          /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/hadoop-annotations-2.0.0-cdh4.7.0.jar
          /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/hadoop-auth-2.0.0-cdh4.7.0.jar
          /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/hadoop-common-2.0.0-cdh4.7.0.jar
          /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.7.0.jar
          /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.7.0.jar

          And point to the correct Zookeeper in drill-override.conf
          drill.exec :

          { cluster-id: "working_cdh_drill" zk.connect: "10.10.30.156:2181” }

          There is an open issue to add the packages mentioned above to the class path automatically:
          https://issues.apache.org/jira/browse/DRILL-1160

          Show
          Neeraja added a comment - Can you refer to the configuration mentioned in https://issues.apache.org/jira/browse/DRILL-1075 to see if it helps. Here is a quick snippet from the bug. ---------- Here’s a sample configuration for HDFS: { "type" : "file", "enabled" : true, "connection" : "hdfs://10.10.30.156:8020/", “workspaces" : { “root : { "location" : "/user/root/drill", "writable" : true, "storageformat" : "null" } }, “formats" : { “json” : { "type" : “json" } } } Make sure these packages are in the class path (changes based on installation): /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/hadoop-annotations-2.0.0-cdh4.7.0.jar /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/hadoop-auth-2.0.0-cdh4.7.0.jar /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/hadoop-common-2.0.0-cdh4.7.0.jar /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.7.0.jar /opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.0.0-cdh4.7.0.jar And point to the correct Zookeeper in drill-override.conf drill.exec : { cluster-id: "working_cdh_drill" zk.connect: "10.10.30.156:2181” } There is an open issue to add the packages mentioned above to the class path automatically: https://issues.apache.org/jira/browse/DRILL-1160
          Hide
          Hao Zhu added a comment -

          Hi Neeraja,

          I tried but still got the same error from drill log.
          drill-env.sh:

          export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop
          export CLASSPATH=/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-annotations.jar:/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-auth.jar:/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-common.jar:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/hadoop-hdfs.jar:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar:$CLASSPATH
          

          drill-override.conf:

          drill.exec: {
            cluster-id: "mydrill",
            zk.connect: "admin.xxx.com:2181,hdw1.xxx.com:2181,hdw3.xxx.com:2181"
          }
          

          Any suggestions on this?

          Thanks,
          Hao

          Show
          Hao Zhu added a comment - Hi Neeraja, I tried but still got the same error from drill log. drill-env.sh: export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop export CLASSPATH=/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-annotations.jar:/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-auth.jar:/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-common.jar:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/hadoop-hdfs.jar:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar:$CLASSPATH drill-override.conf: drill.exec: { cluster-id: "mydrill" , zk.connect: "admin.xxx.com:2181,hdw1.xxx.com:2181,hdw3.xxx.com:2181" } Any suggestions on this? Thanks, Hao
          Hide
          Ramana Inukonda Nagaraj added a comment -

          Seeing the same issue with CDH5
          Let me know if you need additional details. Followed the steps listed in 1075 and still hit the same issues.

          Show
          Ramana Inukonda Nagaraj added a comment - Seeing the same issue with CDH5 Let me know if you need additional details. Followed the steps listed in 1075 and still hit the same issues.
          Hide
          Hao Zhu added a comment -

          Seems CDH,HDP,PHD,MapR are so different that we need different configuration steps for each of them.

          Show
          Hao Zhu added a comment - Seems CDH,HDP,PHD,MapR are so different that we need different configuration steps for each of them.

            People

            • Assignee:
              Neeraja
              Reporter:
              Hao Zhu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development