Flume
  1. Flume
  2. FLUME-842

NPE when using jdbc channel with 2 local nodes communicating through avro

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: NG alpha 1
    • Fix Version/s: notrack
    • Component/s: Channel
    • Labels:
      None
    • Environment:

      Description

      Expected comm looks like this:
      AvroCLIClient ==(41414)==> avroSource=>jdbcChannel=>avroSink ==(41415)==> avroSource=>jdbcChannel=>loggerSink

      Shorthand:
      AvroCLIClient ==(41414)==> host1 ==(41415)==> host2

      Steps:
      1) Run host2 with main class org.apache.flume.node.Application and args:
      -f /home/will/git/apache/flume/conf/flume.properties -n host2
      2) Run host1 with same main class, and args:
      -f /home/will/git/apache/flume/conf/flume.properties -n host1
      3) Run Avro client with main class org.apache.flume.client.avro.AvroCLIClient and args:
      -H localhost -p 41414 -F /etc/passwd

      I see the following NPE on host1 after step 2 (and before step 3). There's no problem when switching to memory channel for both hosts.

      2011-11-07 11:00:42,103 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
      java.lang.NullPointerException
      at org.apache.flume.channel.jdbc.impl.DerbySchemaHandler.schemaExists(DerbySchemaHandler.java:321)
      at org.apache.flume.channel.jdbc.impl.JdbcChannelProviderImpl.initializeSchema(JdbcChannelProviderImpl.java:108)
      at org.apache.flume.channel.jdbc.impl.JdbcChannelProviderImpl.initialize(JdbcChannelProviderImpl.java:95)
      at org.apache.flume.channel.jdbc.JdbcChannelProviderFactory.getProvider(JdbcChannelProviderFactory.java:35)
      at org.apache.flume.channel.jdbc.JdbcChannel.configure(JdbcChannel.java:81)
      at org.apache.flume.conf.Configurables.configure(Configurables.java:22)
      at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadChannels(PropertiesFileConfigurationProvider.java:223)
      at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:184)
      at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
      at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$1(AbstractFileConfigurationProvider.java:114)
      at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

        Activity

        Will McQueen created issue -
        Hide
        Will McQueen added a comment -

        I narrowed this issues down to being a case where there is more than one JDBC channel running on the same machine (where each JDBC channel is contained within a separate Flume agent). The DerbySchemaHandler.schemaExists() fails when attempting to get a connection. It fails with:

        java.sql.SQLException: Failed to start database '/home/will/.flume/jdbc-channel/db'. Another instance of Derby may have already booted the database /home/will/.flume/jdbc-channel/db.

        The first process launched with this connectUrl:
        jdbc:derby:/home/will/.flume/jdbc-channel/db;create=true

        For the 2nd process, what's the minimum change to the connectUrl that needs to be made? Would it be just to change the db name from "db" to some other name, "db2"?

        Show
        Will McQueen added a comment - I narrowed this issues down to being a case where there is more than one JDBC channel running on the same machine (where each JDBC channel is contained within a separate Flume agent). The DerbySchemaHandler.schemaExists() fails when attempting to get a connection. It fails with: java.sql.SQLException: Failed to start database '/home/will/.flume/jdbc-channel/db'. Another instance of Derby may have already booted the database /home/will/.flume/jdbc-channel/db. The first process launched with this connectUrl: jdbc:derby:/home/will/.flume/jdbc-channel/db;create=true For the 2nd process, what's the minimum change to the connectUrl that needs to be made? Would it be just to change the db name from "db" to some other name, "db2"?
        Will McQueen made changes -
        Field Original Value New Value
        Assignee Arvind Prabhakar [ aprabhakar ]
        Hide
        Arvind Prabhakar added a comment -

        @Will the JDBC channel is a heavy weight and is designed to be run as a single instance for a user. That's why the choice of putting the database files under the user-home directory which causes the issue you are seeing. If you must run multiple agents using the JDBC channel on the same machine, you can do one of the following things:

        1. For any subsequent process that you launch pass in the system property -Duser.home pointing to a location other than the default one.
        2. Run any subsequent process as a different user (other than the user running the first process).
        3. Specify the explicit database configuration manually for subsequent processes. Please see the properties specified in the class org.apache.flume.channel.jdbc.ConfigurationConstants for details on these properties.

        Thanks,
        Arvind

        Show
        Arvind Prabhakar added a comment - @Will the JDBC channel is a heavy weight and is designed to be run as a single instance for a user. That's why the choice of putting the database files under the user-home directory which causes the issue you are seeing. If you must run multiple agents using the JDBC channel on the same machine, you can do one of the following things: 1. For any subsequent process that you launch pass in the system property -Duser.home pointing to a location other than the default one. 2. Run any subsequent process as a different user (other than the user running the first process). 3. Specify the explicit database configuration manually for subsequent processes. Please see the properties specified in the class org.apache.flume.channel.jdbc.ConfigurationConstants for details on these properties. Thanks, Arvind
        Hide
        Will McQueen added a comment -

        Apparently yes, the database dir ('db' in this case) can't be accessed by more than once process. I modified my props file as follows to get around the issue (modification shown below... used 'db1' and 'db2' for host1 and host2 agents, respectively... I also had to explicitly specify the defaults we use).

        Arvind, unless you have any more comments to add, you can close this bug. Thanks.

        host1.channels = jdbcChannel
        host1.sources = avroSource
        host1.sinks = avroSink
        #
        host1.channels.jdbcChannel.type=jdbc
        host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.driver.url=jdbc:derby:/home/will/.flume/jdbc-channel/db1;create=true
        host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.driver.class=org.apache.derby.jdbc.EmbeddedDriver
        host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.username=sa
        host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.password=
        host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.type=DERBY
        #
        host1.sources.avroSource.type=avro
        host1.sources.avroSource.channels=jdbcChannel
        host1.sources.avroSource.bind=0.0.0.0
        host1.sources.avroSource.port=41414
        #
        host1.sinks.avroSink.type=avro
        host1.sinks.avroSink.channel=jdbcChannel
        host1.sinks.avroSink.hostname=localhost
        host1.sinks.avroSink.port=41415
        host1.sinks.avroSink.batch-size=100
        #-----
        host2.channels=jdbcChannel
        host2.sources=avroSource
        host2.sinks=loggerSink
        #
        host2.channels.jdbcChannel.type=jdbc
        host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.driver.url=jdbc:derby:/home/will/.flume/jdbc-channel/db2;create=true
        host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.driver.class=org.apache.derby.jdbc.EmbeddedDriver
        host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.username=sa
        host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.password=
        host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.type=DERBY

        #
        host2.sources.avroSource.type=avro
        host2.sources.avroSource.channels=jdbcChannel
        host2.sources.avroSource.bind=0.0.0.0
        host2.sources.avroSource.port=41415
        #
        host2.sinks.loggerSink.type=logger
        host2.sinks.loggerSink.channel=jdbcChannel

        Show
        Will McQueen added a comment - Apparently yes, the database dir ('db' in this case) can't be accessed by more than once process. I modified my props file as follows to get around the issue (modification shown below... used 'db1' and 'db2' for host1 and host2 agents, respectively... I also had to explicitly specify the defaults we use). Arvind, unless you have any more comments to add, you can close this bug. Thanks. host1.channels = jdbcChannel host1.sources = avroSource host1.sinks = avroSink # host1.channels.jdbcChannel.type=jdbc host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.driver.url=jdbc:derby:/home/will/.flume/jdbc-channel/db1;create=true host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.driver.class=org.apache.derby.jdbc.EmbeddedDriver host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.username=sa host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.password= host1.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.type=DERBY # host1.sources.avroSource.type=avro host1.sources.avroSource.channels=jdbcChannel host1.sources.avroSource.bind=0.0.0.0 host1.sources.avroSource.port=41414 # host1.sinks.avroSink.type=avro host1.sinks.avroSink.channel=jdbcChannel host1.sinks.avroSink.hostname=localhost host1.sinks.avroSink.port=41415 host1.sinks.avroSink.batch-size=100 #----- host2.channels=jdbcChannel host2.sources=avroSource host2.sinks=loggerSink # host2.channels.jdbcChannel.type=jdbc host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.driver.url=jdbc:derby:/home/will/.flume/jdbc-channel/db2;create=true host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.driver.class=org.apache.derby.jdbc.EmbeddedDriver host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.username=sa host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.password= host2.channels.jdbcChannel.org.apache.flume.channel.jdbc.db.type=DERBY # host2.sources.avroSource.type=avro host2.sources.avroSource.channels=jdbcChannel host2.sources.avroSource.bind=0.0.0.0 host2.sources.avroSource.port=41415 # host2.sinks.loggerSink.type=logger host2.sinks.loggerSink.channel=jdbcChannel
        Will McQueen made changes -
        Description Expected comm looks like this:
             AvroCLIClient ==(41414)==> avroSource=>jdbcChannel=>avroSink ==(41415)==> avroSource=>jdbcChannel=>loggerSink

        Shorthand:
             AvroCLIClient ==(41414)==> host1 ==(41415)==> host2

        Steps:
        1) Run host2 with main class org.apache.flume.node.Application and args:
             -f /home/will/git/flume-asf/conf/flume.properties -n host2
        2) Run host1 with same main class, and args:
             -f /home/will/git/flume-asf/conf/flume.properties -n host1
        3) Run Avro client with main class org.apache.flume.client.avro.AvroCLIClient and args:
             -H localhost -p 41414 -F /etc/passwd

        I see the following NPE on host1 after step 2 (and before step 3). There's no problem when switching to memory channel for both hosts.

        2011-11-07 11:00:42,103 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
        java.lang.NullPointerException
        at org.apache.flume.channel.jdbc.impl.DerbySchemaHandler.schemaExists(DerbySchemaHandler.java:321)
        at org.apache.flume.channel.jdbc.impl.JdbcChannelProviderImpl.initializeSchema(JdbcChannelProviderImpl.java:108)
        at org.apache.flume.channel.jdbc.impl.JdbcChannelProviderImpl.initialize(JdbcChannelProviderImpl.java:95)
        at org.apache.flume.channel.jdbc.JdbcChannelProviderFactory.getProvider(JdbcChannelProviderFactory.java:35)
        at org.apache.flume.channel.jdbc.JdbcChannel.configure(JdbcChannel.java:81)
        at org.apache.flume.conf.Configurables.configure(Configurables.java:22)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadChannels(PropertiesFileConfigurationProvider.java:223)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:184)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$1(AbstractFileConfigurationProvider.java:114)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
        Expected comm looks like this:
             AvroCLIClient ==(41414)==> avroSource=>jdbcChannel=>avroSink ==(41415)==> avroSource=>jdbcChannel=>loggerSink

        Shorthand:
             AvroCLIClient ==(41414)==> host1 ==(41415)==> host2

        Steps:
        1) Run host2 with main class org.apache.flume.node.Application and args:
             -f /home/will/git/apache/flume/conf/flume.properties -n host2
        2) Run host1 with same main class, and args:
             -f /home/will/git/apache/flume/conf/flume.properties -n host1
        3) Run Avro client with main class org.apache.flume.client.avro.AvroCLIClient and args:
             -H localhost -p 41414 -F /etc/passwd

        I see the following NPE on host1 after step 2 (and before step 3). There's no problem when switching to memory channel for both hosts.

        2011-11-07 11:00:42,103 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
        java.lang.NullPointerException
        at org.apache.flume.channel.jdbc.impl.DerbySchemaHandler.schemaExists(DerbySchemaHandler.java:321)
        at org.apache.flume.channel.jdbc.impl.JdbcChannelProviderImpl.initializeSchema(JdbcChannelProviderImpl.java:108)
        at org.apache.flume.channel.jdbc.impl.JdbcChannelProviderImpl.initialize(JdbcChannelProviderImpl.java:95)
        at org.apache.flume.channel.jdbc.JdbcChannelProviderFactory.getProvider(JdbcChannelProviderFactory.java:35)
        at org.apache.flume.channel.jdbc.JdbcChannel.configure(JdbcChannel.java:81)
        at org.apache.flume.conf.Configurables.configure(Configurables.java:22)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadChannels(PropertiesFileConfigurationProvider.java:223)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:184)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$1(AbstractFileConfigurationProvider.java:114)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
        Will McQueen made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Will McQueen added a comment -

        Implemented choice #3 from Arvind's recommendations. Works as expected now.

        Show
        Will McQueen added a comment - Implemented choice #3 from Arvind's recommendations. Works as expected now.
        Will McQueen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Arvind Prabhakar added a comment -

        Not a bug - hence making the fix version as notrack.

        Show
        Arvind Prabhakar added a comment - Not a bug - hence making the fix version as notrack.
        Arvind Prabhakar made changes -
        Fix Version/s notrack [ 12320245 ]

          People

          • Assignee:
            Arvind Prabhakar
            Reporter:
            Will McQueen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development