Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4188

run query generator with KUDU_IS_SUPPORTED=true

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Infrastructure
    • Labels:
      None

      Description

      Due to KUDU-1419, and the query generator running on Docker-on-AUFS, we should come up with a way for to run with KUDU_IS_SUPPORTED=true.

      There are some options here, including external docker volumes, configuring devicemapper as an alternative Docker storage driver, or getting the query generator execution completely out of Docker.

        Activity

        Hide
        mikesbrown Michael Brown added a comment -

        I can successfully start Kudu in a minicluster in a container when I use an external volume. I'm able to create the external volume by starting up a container, running docker-boot, and using rsync on the host to copy /home/dev/Impala/testdata/cluster from the container to the host. Stop that container and re-run the container with docker run -v /where/I/saved/clusterdata:/home/dev/Impala/testdata/cluster.

        I can then start Kudu in the minicluster and read from the tables in tpch_kudu and functional_kudu.

        Show
        mikesbrown Michael Brown added a comment - I can successfully start Kudu in a minicluster in a container when I use an external volume. I'm able to create the external volume by starting up a container, running docker-boot , and using rsync on the host to copy /home/dev/Impala/testdata/cluster from the container to the host. Stop that container and re-run the container with docker run -v /where/I/saved/clusterdata:/home/dev/Impala/testdata/cluster . I can then start Kudu in the minicluster and read from the tables in tpch_kudu and functional_kudu .
        Hide
        mikesbrown Michael Brown added a comment -

        I was able to run Kudu in devicemapper / direct-lvm, but I needed to use CentOS 7 to use devicemapper.

        The steps were here:
        https://docs.docker.com/engine/installation/linux/centos/
        https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/

        Along the way though, I encountered this problem:
        https://github.com/docker/docker/issues/23869

        I overcame that by changing the dm.basesize to 200G (from 10G).
        https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options

        This is because the Impala Docker image is huge. I know that it's around 100G and figured 200G would be plenty.

        I also encountered this problem:
        https://github.com/docker/docker/issues/7459

        I overcame that by using docker run --cap-add=SYS_ADMIN --security-opt=seccomp:unconfined.

        But this solution seems flaky: the first time I did this, I hit the issue 7459 problem again. Another time, Kudu complained about not being able to contact NTP. The next morning, things worked. This is pretty unreliable.

        Moreover, the instructions for setting up devicemapper on this page https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/ don't work for Ubuntu. There are LVM2 commands called for that don't exist and other LVM2 commands whose options don't exist. This means someone else needed to run the query generator on Ubuntu has to solve the same problems I will if I continue down that path. We already have 2 workarounds; we will need even more to adapt for Ubuntu.

        Last, devicemapper isn't a preferred storage driver in the community. While it was good to do some research here, I don't think it's tenable to use devicemapper as the solution.

        Show
        mikesbrown Michael Brown added a comment - I was able to run Kudu in devicemapper / direct-lvm, but I needed to use CentOS 7 to use devicemapper. The steps were here: https://docs.docker.com/engine/installation/linux/centos/ https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/ Along the way though, I encountered this problem: https://github.com/docker/docker/issues/23869 I overcame that by changing the dm.basesize to 200G (from 10G). https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options This is because the Impala Docker image is huge. I know that it's around 100G and figured 200G would be plenty. I also encountered this problem: https://github.com/docker/docker/issues/7459 I overcame that by using docker run --cap-add=SYS_ADMIN --security-opt=seccomp:unconfined . But this solution seems flaky: the first time I did this, I hit the issue 7459 problem again. Another time, Kudu complained about not being able to contact NTP. The next morning, things worked. This is pretty unreliable. Moreover, the instructions for setting up devicemapper on this page https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/ don't work for Ubuntu. There are LVM2 commands called for that don't exist and other LVM2 commands whose options don't exist. This means someone else needed to run the query generator on Ubuntu has to solve the same problems I will if I continue down that path. We already have 2 workarounds; we will need even more to adapt for Ubuntu. Last, devicemapper isn't a preferred storage driver in the community. While it was good to do some research here, I don't think it's tenable to use devicemapper as the solution.
        Hide
        mikesbrown Michael Brown added a comment -

        Update:

        1. I had a few successful Kudu qgen runs with SELECT-only, using an external volume. I think this validates that external volumes will be fine for now.
        2. I've made changes to Cloudera's internal Docker Impala Docker container to support warming the volume. Now I need to write the test code to set this up and do it automatically.

        Show
        mikesbrown Michael Brown added a comment - Update: 1. I had a few successful Kudu qgen runs with SELECT-only, using an external volume. I think this validates that external volumes will be fine for now. 2. I've made changes to Cloudera's internal Docker Impala Docker container to support warming the volume. Now I need to write the test code to set this up and do it automatically.
        Show
        mikesbrown Michael Brown added a comment - https://gerrit.cloudera.org/#/c/4678/
        Hide
        mikesbrown Michael Brown added a comment -
        commit db5de41a808d0e177ac0089ead2e420ab6043d1d
        Author: Michael Brown <mikeb@cloudera.com>
        Date:   Thu Sep 22 15:04:41 2016 -0700
        
            IMPALA-4188: Leopard: support external Docker volumes
        
            To be able to run the Random Query Generator with Impala and Kudu, we
            need to mount an external Docker volume as a workaround to KUDU-1419.
            This patch introduces a series of environment variables a user may tweak
            in order to help with that purpose. The patch assumes a viable,
            reasonable Docker container based on a standard Linux distribution like
            Ubuntu 14.
        
            To assist users, I've updated the Leopard README with instructions on
            the environment variables' meanings.
        
            The gist here is that the container is the source of truth, which means
            to create an external volume, we need to copy the testdata off the
            container onto the host running Docker Engine. To do that we suggest a
            strategy using rsync via passwordless SSH key.
        
            Testing:
            I used a Cloudera Docker container that has Impala in /home/dev/Impala.
            Before, Kudu would fail to start due to KUDU-1419. Now, we load testdata
            into an external volume, build Impala, run the minicluster including
            Kudu, and can access the tpch_kudu data.
        
            I made flake8 fixes as well. flake8 on this file is now clean.
        
            Change-Id: Ia7d9d9253fcd7e3905e389ddeb1438cee3e24480
            Reviewed-on: http://gerrit.cloudera.org:8080/4678
            Reviewed-by: Michael Brown <mikeb@cloudera.com>
            Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
            Tested-by: Internal Jenkins
        
        Show
        mikesbrown Michael Brown added a comment - commit db5de41a808d0e177ac0089ead2e420ab6043d1d Author: Michael Brown <mikeb@cloudera.com> Date: Thu Sep 22 15:04:41 2016 -0700 IMPALA-4188: Leopard: support external Docker volumes To be able to run the Random Query Generator with Impala and Kudu, we need to mount an external Docker volume as a workaround to KUDU-1419. This patch introduces a series of environment variables a user may tweak in order to help with that purpose. The patch assumes a viable, reasonable Docker container based on a standard Linux distribution like Ubuntu 14. To assist users, I've updated the Leopard README with instructions on the environment variables' meanings. The gist here is that the container is the source of truth, which means to create an external volume, we need to copy the testdata off the container onto the host running Docker Engine. To do that we suggest a strategy using rsync via passwordless SSH key. Testing: I used a Cloudera Docker container that has Impala in /home/dev/Impala. Before, Kudu would fail to start due to KUDU-1419. Now, we load testdata into an external volume, build Impala, run the minicluster including Kudu, and can access the tpch_kudu data. I made flake8 fixes as well. flake8 on this file is now clean. Change-Id: Ia7d9d9253fcd7e3905e389ddeb1438cee3e24480 Reviewed-on: http://gerrit.cloudera.org:8080/4678 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            mikesbrown Michael Brown
            Reporter:
            mikesbrown Michael Brown
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development