Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-1019

Remove mysql requirement constraint from sqoop tests

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.8.0
    • Component/s: tests
    • Labels:
      None

      Description

      Existing Sqoop tests contradict self-sufficiency of the integration testing paradigm. The tests require an axillary mysql server configuration. While it might be acceptable for an established infrastructure it clearly has the disadvantage for ad-hoc testing, where one needs to make sure that a specially configured mysql is available.

      Replacing mysql with H2 for the testing would solve the problem elegantly.

        Issue Links

          Activity

          Hide
          cos Konstantin Boudnik added a comment -

          To elaborate on this a little...
          H2 is a single jar full scale DB server. It can be started as a separate JVM process or programmatically, e.g. from test setup method. It can keep the DB in memory, which suits the purpose perfectly, as the test won't need to clean up after itself.

          Here's even better part - H2 can pretend to be anything between mysql to Postgress, with stuff like mssql, oracle and else in between.

          Show
          cos Konstantin Boudnik added a comment - To elaborate on this a little... H2 is a single jar full scale DB server. It can be started as a separate JVM process or programmatically, e.g. from test setup method. It can keep the DB in memory, which suits the purpose perfectly, as the test won't need to clean up after itself. Here's even better part - H2 can pretend to be anything between mysql to Postgress, with stuff like mssql, oracle and else in between.
          Hide
          jayunit100 jay vyas added a comment -

          Great idea to decouple from mysql.

          But, taking it further, since the base requirement of a sqoop ETL flow that a jdbc driver is available for the data source — maybe a mock JDBC data source would be an even more appropriate way to implement the tests.

          Show
          jayunit100 jay vyas added a comment - Great idea to decouple from mysql. But, taking it further, since the base requirement of a sqoop ETL flow that a jdbc driver is available for the data source — maybe a mock JDBC data source would be an even more appropriate way to implement the tests.
          Hide
          cos Konstantin Boudnik added a comment -

          Jay, I am not that familiar with JDBC world, so I can't comment on the complexity of mocking a JDBC data source.

          Show
          cos Konstantin Boudnik added a comment - Jay, I am not that familiar with JDBC world, so I can't comment on the complexity of mocking a JDBC data source.
          Hide
          jayunit100 jay vyas added a comment -

          First, a test class would have to be extended from an existing JDBC Mock Driver , like this one http://mockrunner.sourceforge.net/doc/api/com/mockrunner/mock/jdbc/MockDriver.html.

          Then, given that this class is on the classpath for all the nodes in the test, you would specify the class name with the --driver method in the sqoop commandline.

          The result would be that the Mock class would respond to all JDBC queries and return data.
          Not sure if this is a concern, but parallel IO should work just fine since there would be a different mock running in the JVM of each task.

          That is an advantage of the mock : I think parallel IO tests of sqoop, were they ever to be implemented, would require sharing the database server urls etc... But parallel queries against a mock connection can be done without having any underlying database infrastructure.

          Does that help ?

          Show
          jayunit100 jay vyas added a comment - First, a test class would have to be extended from an existing JDBC Mock Driver , like this one http://mockrunner.sourceforge.net/doc/api/com/mockrunner/mock/jdbc/MockDriver.html . Then, given that this class is on the classpath for all the nodes in the test, you would specify the class name with the --driver method in the sqoop commandline. The result would be that the Mock class would respond to all JDBC queries and return data. Not sure if this is a concern, but parallel IO should work just fine since there would be a different mock running in the JVM of each task. That is an advantage of the mock : I think parallel IO tests of sqoop, were they ever to be implemented, would require sharing the database server urls etc... But parallel queries against a mock connection can be done without having any underlying database infrastructure. Does that help ?
          Hide
          cos Konstantin Boudnik added a comment -

          good description, Jay. Care to put together a patch so we can start moving with it? Thanks a million in advance!

          Show
          cos Konstantin Boudnik added a comment - good description, Jay. Care to put together a patch so we can start moving with it? Thanks a million in advance!
          Hide
          jayunit100 jay vyas added a comment -

          Sure ! I'll start digging into the source and see. Any suggestions on where to start learning about how to build/edit source for the bigtop contributions ?

          Show
          jayunit100 jay vyas added a comment - Sure ! I'll start digging into the source and see. Any suggestions on where to start learning about how to build/edit source for the bigtop contributions ?
          Hide
          cos Konstantin Boudnik added a comment -

          I think the best way to go around is to start immediately with Sqoop test source code under
          bigtop-tests/test-artifacts/sqoop/ and go from there.

          Show
          cos Konstantin Boudnik added a comment - I think the best way to go around is to start immediately with Sqoop test source code under bigtop-tests/test-artifacts/sqoop/ and go from there.
          Hide
          jayunit100 jay vyas added a comment -

          Cool, I'll take an initial look and either leave a comment here or a link directly to a gist.

          Show
          jayunit100 jay vyas added a comment - Cool, I'll take an initial look and either leave a comment here or a link directly to a gist.
          Hide
          cos Konstantin Boudnik added a comment -

          Feel free to just attach the patch directly to the ticket. I guess this is the way most of the projects do this in Apache.

          Show
          cos Konstantin Boudnik added a comment - Feel free to just attach the patch directly to the ticket. I guess this is the way most of the projects do this in Apache.
          Hide
          jayunit100 jay vyas added a comment -

          FYI, Im setting up a bigtop dev environment and VM for testing the existing stuff now. This may be a while, after looking at the code im seeing that setting up a solid dev environment, as well as removing sql implementation specifics might not be the most practical approach. So anyone else wants to jump in feel free. Otherwise, I'll incrementally progress and update along the way.

          Here is the path im taking:

          1) Set up dev environment with bigtop sqoop smoke tests.

          2) Set up VM with MySQL installed to confirm that reproduce the mysql tests, passing.

          3) Change the code and rebuild the sqoop smoke test jars and confirm that i can redeploy the smoke tests (i.e. that i have a working dev environment that can be deployed to my VM).

          So far ive looked into the code, which appears to load sql requests from text files, and Initially, I see there is a lot of SQL code in the mysql tests . It remains to be seen wether mocking will even be practical. I might first try swapping in derby/h2 as a first pass, to see if decoupling from mysql is easy enough. Then moving on to mocks or commiting that patch as a first iteration.

          Given other obligatins, FYI, this will be slow going. If anyone else wants to jump in feel free

          Here is the summary of SQL files used in the existing tests Import/Export tests. 604 lines in all.

          [root@localhost bigtop]# wc `find ./ -name mysql*sql`
          38 226 1424 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/hbase-sqoop/mysql-load-db.sql
          38 202 1285 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/hbase-sqoop/mysql-create-db.sql
          54 385 2354 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/mysql-files/mysql-create-tables.sql
          39 175 1174 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/mysql-files/mysql-create-db.sql
          57 263 3104 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/mysql-files/mysql-insert-data.sql
          38 226 1420 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/hive-sqoop/mysql-load-db.sql
          38 202 1277 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/hive-sqoop/mysql-create-db.sql
          38 226 1424 ./bigtop-tests/test-artifacts/sqoop/target/classes/hbase-sqoop/mysql-load-db.sql
          38 202 1285 ./bigtop-tests/test-artifacts/sqoop/target/classes/hbase-sqoop/mysql-create-db.sql
          54 385 2354 ./bigtop-tests/test-artifacts/sqoop/target/classes/mysql-files/mysql-create-tables.sql
          39 175 1174 ./bigtop-tests/test-artifacts/sqoop/target/classes/mysql-files/mysql-create-db.sql
          57 263 3104 ./bigtop-tests/test-artifacts/sqoop/target/classes/mysql-files/mysql-insert-data.sql
          38 226 1420 ./bigtop-tests/test-artifacts/sqoop/target/classes/hive-sqoop/mysql-load-db.sql
          38 202 1277 ./bigtop-tests/test-artifacts/sqoop/target/classes/hive-sqoop/mysql-create-db.sql
          604 3358 24076 total

          Show
          jayunit100 jay vyas added a comment - FYI, Im setting up a bigtop dev environment and VM for testing the existing stuff now. This may be a while, after looking at the code im seeing that setting up a solid dev environment, as well as removing sql implementation specifics might not be the most practical approach. So anyone else wants to jump in feel free. Otherwise, I'll incrementally progress and update along the way. Here is the path im taking: 1) Set up dev environment with bigtop sqoop smoke tests. 2) Set up VM with MySQL installed to confirm that reproduce the mysql tests, passing. 3) Change the code and rebuild the sqoop smoke test jars and confirm that i can redeploy the smoke tests (i.e. that i have a working dev environment that can be deployed to my VM). So far ive looked into the code, which appears to load sql requests from text files, and Initially, I see there is a lot of SQL code in the mysql tests . It remains to be seen wether mocking will even be practical. I might first try swapping in derby/h2 as a first pass, to see if decoupling from mysql is easy enough. Then moving on to mocks or commiting that patch as a first iteration. Given other obligatins, FYI, this will be slow going. If anyone else wants to jump in feel free Here is the summary of SQL files used in the existing tests Import/Export tests. 604 lines in all. [root@localhost bigtop] # wc `find ./ -name mysql*sql` 38 226 1424 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/hbase-sqoop/mysql-load-db.sql 38 202 1285 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/hbase-sqoop/mysql-create-db.sql 54 385 2354 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/mysql-files/mysql-create-tables.sql 39 175 1174 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/mysql-files/mysql-create-db.sql 57 263 3104 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/mysql-files/mysql-insert-data.sql 38 226 1420 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/hive-sqoop/mysql-load-db.sql 38 202 1277 ./bigtop-tests/test-artifacts/sqoop/src/main/resources/hive-sqoop/mysql-create-db.sql 38 226 1424 ./bigtop-tests/test-artifacts/sqoop/target/classes/hbase-sqoop/mysql-load-db.sql 38 202 1285 ./bigtop-tests/test-artifacts/sqoop/target/classes/hbase-sqoop/mysql-create-db.sql 54 385 2354 ./bigtop-tests/test-artifacts/sqoop/target/classes/mysql-files/mysql-create-tables.sql 39 175 1174 ./bigtop-tests/test-artifacts/sqoop/target/classes/mysql-files/mysql-create-db.sql 57 263 3104 ./bigtop-tests/test-artifacts/sqoop/target/classes/mysql-files/mysql-insert-data.sql 38 226 1420 ./bigtop-tests/test-artifacts/sqoop/target/classes/hive-sqoop/mysql-load-db.sql 38 202 1277 ./bigtop-tests/test-artifacts/sqoop/target/classes/hive-sqoop/mysql-create-db.sql 604 3358 24076 total
          Hide
          jayunit100 jay vyas added a comment -

          Ive created a very simple smoke test that is totally self contained . Any thoughts? It could be made more complex quite easily with a few 1000 randomly generated records:

          https://gist.github.com/jayunit100/7254065

          Show
          jayunit100 jay vyas added a comment - Ive created a very simple smoke test that is totally self contained . Any thoughts? It could be made more complex quite easily with a few 1000 randomly generated records: https://gist.github.com/jayunit100/7254065
          Hide
          jarcec Jarek Jarcec Cecho added a comment -

          Putting my Sqoop committership hat on, it seems as a valid approach to me jay vyas.

          Show
          jarcec Jarek Jarcec Cecho added a comment - Putting my Sqoop committership hat on, it seems as a valid approach to me jay vyas .
          Hide
          jayunit100 jay vyas added a comment -

          FYI, this is the latest iteration of this. It will work in bigtop IFF there is a way to write a single file to a locally available absolute path on all nodes on the cluster.

          https://github.com/jayunit100/bigtop/blob/master/bigtop-tests/test-artifacts/sqoop/src/main/resources/test2.sh

          Any possibility of that using the YARN APIs?

          Show
          jayunit100 jay vyas added a comment - FYI, this is the latest iteration of this. It will work in bigtop IFF there is a way to write a single file to a locally available absolute path on all nodes on the cluster. https://github.com/jayunit100/bigtop/blob/master/bigtop-tests/test-artifacts/sqoop/src/main/resources/test2.sh Any possibility of that using the YARN APIs?
          Hide
          mantonov Mikhail Antonov added a comment -

          small update - working no that, existing sqoop tests don't pass for me with the best mysql configuration I could craft - looking into it.

          Show
          mantonov Mikhail Antonov added a comment - small update - working no that, existing sqoop tests don't pass for me with the best mysql configuration I could craft - looking into it.
          Hide
          jayunit100 jay vyas added a comment -

          I've got this working in BIGTOP-1222, marking as subsumed......

          The strategy i went with was to use hsql to launch an embedded db at runtime, and then launch a sqoop job in the test.

          After the Sqoop job runs the embedded hsql server is stopped and shut down.

          Show
          jayunit100 jay vyas added a comment - I've got this working in BIGTOP-1222 , marking as subsumed...... The strategy i went with was to use hsql to launch an embedded db at runtime, and then launch a sqoop job in the test. After the Sqoop job runs the embedded hsql server is stopped and shut down.

            People

            • Assignee:
              mantonov Mikhail Antonov
              Reporter:
              cos Konstantin Boudnik
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development