Nutch
  1. Nutch
  2. NUTCH-1477

NPE when injecting with DataFileAvroStore

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 2.1
    • Fix Version/s: 2.4
    • Component/s: storage
    • Labels:
      None
    • Environment:

      Java 1.6.0_35

      Description

      Fresh installation of Nutch 2.1, configured to use DataFileAvroStore. Injection job throws NullPointerException, see below. No error when I switch to MemStore.

      java.lang.NullPointerException
      at org.apache.avro.io.BinaryEncoder.writeString(BinaryEncoder.java:133)
      at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:176)
      at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:171)
      at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
      at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
      at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
      at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
      at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
      at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
      at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
      at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
      at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
      at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

      1. webpage.avsc
        2 kB
        Julien Nioche
      2. webpage.avsc
        2 kB
        Lewis John McGibbney
      3. webpage.avsc
        2 kB
        Alfonso Nishikawa
      4. webpage.avsc
        2 kB
        Alfonso Nishikawa
      5. NUTCH-1477.patch
        6 kB
        Lewis John McGibbney
      6. gora-core-0.2.1.jar
        147 kB
        Lewis John McGibbney

        Issue Links

          Activity

          Hide
          Alfonso Nishikawa added a comment -

          Alex McLintock : Nutch Persistent classes have been changed by hand. You can find the missing methods at http://svn.apache.org/repos/asf/nutch/tags/release-2.2.1/src/java/org/apache/nutch/storage
          You must add methods Host.getInt(,), Host.getLong(,) and ProtocolStatus.isSuccess() to the classes resulting from compiling.

          Show
          Alfonso Nishikawa added a comment - Alex McLintock : Nutch Persistent classes have been changed by hand. You can find the missing methods at http://svn.apache.org/repos/asf/nutch/tags/release-2.2.1/src/java/org/apache/nutch/storage You must add methods Host.getInt(,), Host.getLong(,) and ProtocolStatus.isSuccess() to the classes resulting from compiling.
          Hide
          Lewis John McGibbney added a comment -

          What does your schemas look like Alex?

          Show
          Lewis John McGibbney added a comment - What does your schemas look like Alex?
          Hide
          Alex McLintock added a comment -

          Hi Folks, I've tried the patch using Nutch 2.2.1 released source, and Gora 0.4 SNAPSHOT (today's trunk) and it doesn't compile for me. Can someone confirm the patch still works for them?

          compile-core:
          [javac] Compiling 109 source files to /home/alex/projects/benrush/apache-nutch-2.2.1/build/classes
          [javac] /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/fetcher/FetcherReducer.java:345: cannot find symbol
          [javac] symbol : method getInt(java.lang.String,int)
          [javac] location: class org.apache.nutch.storage.Host
          [javac] host.getInt("q_mt", maxThreads),
          [javac] ^
          [javac] /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/fetcher/FetcherReducer.java:346: cannot find symbol
          [javac] symbol : method getLong(java.lang.String,long)
          [javac] location: class org.apache.nutch.storage.Host
          [javac] host.getLong("q_cd", crawlDelay),
          [javac] ^
          [javac] /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/fetcher/FetcherReducer.java:347: cannot find symbol
          [javac] symbol : method getLong(java.lang.String,long)
          [javac] location: class org.apache.nutch.storage.Host
          [javac] host.getLong("q_mcd", minCrawlDelay));
          [javac] ^
          [javac] /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/parse/ParserChecker.java:114: cannot find symbol
          [javac] symbol : method isSuccess()
          [javac] location: class org.apache.nutch.storage.ProtocolStatus
          [javac] if(!protocolOutput.getStatus().isSuccess()) {
          [javac] ^
          [javac] Note: /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/storage/Host.java uses unchecked or unsafe operations.
          [javac] Note: Recompile with -Xlint:unchecked for details.
          [javac] 4 errors

          Show
          Alex McLintock added a comment - Hi Folks, I've tried the patch using Nutch 2.2.1 released source, and Gora 0.4 SNAPSHOT (today's trunk) and it doesn't compile for me. Can someone confirm the patch still works for them? compile-core: [javac] Compiling 109 source files to /home/alex/projects/benrush/apache-nutch-2.2.1/build/classes [javac] /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/fetcher/FetcherReducer.java:345: cannot find symbol [javac] symbol : method getInt(java.lang.String,int) [javac] location: class org.apache.nutch.storage.Host [javac] host.getInt("q_mt", maxThreads), [javac] ^ [javac] /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/fetcher/FetcherReducer.java:346: cannot find symbol [javac] symbol : method getLong(java.lang.String,long) [javac] location: class org.apache.nutch.storage.Host [javac] host.getLong("q_cd", crawlDelay), [javac] ^ [javac] /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/fetcher/FetcherReducer.java:347: cannot find symbol [javac] symbol : method getLong(java.lang.String,long) [javac] location: class org.apache.nutch.storage.Host [javac] host.getLong("q_mcd", minCrawlDelay)); [javac] ^ [javac] /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/parse/ParserChecker.java:114: cannot find symbol [javac] symbol : method isSuccess() [javac] location: class org.apache.nutch.storage.ProtocolStatus [javac] if(!protocolOutput.getStatus().isSuccess()) { [javac] ^ [javac] Note: /home/alex/projects/benrush/apache-nutch-2.2.1/src/java/org/apache/nutch/storage/Host.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 4 errors
          Hide
          Alex McLintock added a comment -

          By the way - the original PATCH file no longer runs without problems - I had to manually apply some of the changes. If this issue could be re-examined that would be helpful. I am available test it and help if I can.

          Show
          Alex McLintock added a comment - By the way - the original PATCH file no longer runs without problems - I had to manually apply some of the changes. If this issue could be re-examined that would be helpful. I am available test it and help if I can.
          Hide
          Lewis John McGibbney added a comment -

          If you care to pick Gora trunk, package it and replace the 0.3 version which is pulled with Nutch 2.2.1 then it will all work well... hopefully. You will also need to compile the new databeans using the new schema prior to packaging. HTH, please lt us know what happens.
          Thanks

          Show
          Lewis John McGibbney added a comment - If you care to pick Gora trunk, package it and replace the 0.3 version which is pulled with Nutch 2.2.1 then it will all work well... hopefully. You will also need to compile the new databeans using the new schema prior to packaging. HTH, please lt us know what happens. Thanks
          Hide
          Alex McLintock added a comment -

          Any news on this or recommendations of a workaround? (I am not sure what my options are for avoiding Avro - or at least avoiding this error) (I am using Amazon EMR, Nutch 2.2.1 and S3 storage in case that helps)

          Show
          Alex McLintock added a comment - Any news on this or recommendations of a workaround? (I am not sure what my options are for avoiding Avro - or at least avoiding this error) (I am using Amazon EMR, Nutch 2.2.1 and S3 storage in case that helps)
          Hide
          Alfonso Nishikawa added a comment - - edited

          Thanks, Lewis

          I must warn that this patch will make HBase, Accumulo and Cassandra backends fail since Gora still does not handle unions on them (SQL not implemented as long as I know) if you use schemas with unions. Dirty workaround is not using unions and hardcode a default value... but it is too much dirty .
          I am working on this; I expect to have all fixed in a pair of weeks. HBase fixed on my local copy and working on Cassandra backend.

          Show
          Alfonso Nishikawa added a comment - - edited Thanks, Lewis I must warn that this patch will make HBase, Accumulo and Cassandra backends fail since Gora still does not handle unions on them (SQL not implemented as long as I know) if you use schemas with unions. Dirty workaround is not using unions and hardcode a default value... but it is too much dirty . I am working on this; I expect to have all fixed in a pair of weeks. HBase fixed on my local copy and working on Cassandra backend.
          Hide
          Lewis John McGibbney added a comment -

          Steps to test and resolve this issue.

          patch -p0 -i NUTCH-1477.patch
          

          either build the project or if already built replace the gora-core-0.2.1 dependency in ./build/lib/ with the patched one attached. This includes GORA-174v3.patch.

          ant generate-gora-src
          
          ant job
          

          You should then be able to successfully inject into DataFileAvroDtore (once the correct Gora configuration has been specified of course).

          Great work Alfonso.

          I am currently doing a test crawl on this basis and will put the results on list.

          One final question. I take it we are going to commit the more recent webpage.avsc to solve this issue? Unless folks use the patched jar attached in this issue, this issue cannot be properly resolved until a new version of Gora is released... I don't know when we will do this.

          Show
          Lewis John McGibbney added a comment - Steps to test and resolve this issue. patch -p0 -i NUTCH-1477.patch either build the project or if already built replace the gora-core-0.2.1 dependency in ./build/lib/ with the patched one attached. This includes GORA-174 v3.patch. ant generate-gora-src ant job You should then be able to successfully inject into DataFileAvroDtore (once the correct Gora configuration has been specified of course). Great work Alfonso. I am currently doing a test crawl on this basis and will put the results on list. One final question. I take it we are going to commit the more recent webpage.avsc to solve this issue? Unless folks use the patched jar attached in this issue, this issue cannot be properly resolved until a new version of Gora is released... I don't know when we will do this.
          Hide
          Alfonso Nishikawa added a comment - - edited

          Hi, Lewis.
          So sorry again Thank you very much for testing it.

          Reading the stack trace my guess is that it is an issue writing protocolStatus field, being null. Never happened to me because HBase backend behaves different than DataFileAvroStore, and I have seen now that DataFileAvroStore writes all.

          Uploaded another (* sigh *) version of webpage.avsc with record fields protocolStatus and parseStatus optionals (all optional).

          I checked it twice this time. Hope will success

          Show
          Alfonso Nishikawa added a comment - - edited Hi, Lewis. So sorry again Thank you very much for testing it. Reading the stack trace my guess is that it is an issue writing protocolStatus field, being null. Never happened to me because HBase backend behaves different than DataFileAvroStore, and I have seen now that DataFileAvroStore writes all. Uploaded another (* sigh *) version of webpage.avsc with record fields protocolStatus and parseStatus optionals (all optional). I checked it twice this time. Hope will success
          Hide
          Lewis John McGibbney added a comment -

          The most recent avsc, with the most recent patch in GORA-174 compiles the Java classes properly with the correct getters and setters for WebPage, ParseStatus and ProtocolStatus. I am happy with the part.
          The next problem is that now when I attempt to inject an url list into DataFileAvroStore, I get the following

          java.lang.NullPointerException
          	at org.apache.avro.specific.SpecificDatumWriter.getField(SpecificDatumWriter.java:48)
          	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
          	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
          	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:89)
          	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
          	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
          	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
          	at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54)
          	at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60)
          	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
          	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
          	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185)
          	at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85)
          	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
          	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
          	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
          	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:396)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
          	at org.apache.hadoop.mapred.Child.main(Child.java:249)
          

          I think I might take this one over to user@avro as I don't know enough about the 1.3.3 codebase and I have no immediate thoughts on this one other than the few optimistic attempts I tried to get things working. Even if someone could confirm that the above stack is spewed when injecting would allow us to confirm that this behavior is consistent.

          Show
          Lewis John McGibbney added a comment - The most recent avsc, with the most recent patch in GORA-174 compiles the Java classes properly with the correct getters and setters for WebPage, ParseStatus and ProtocolStatus. I am happy with the part. The next problem is that now when I attempt to inject an url list into DataFileAvroStore, I get the following java.lang.NullPointerException at org.apache.avro.specific.SpecificDatumWriter.getField(SpecificDatumWriter.java:48) at org.apache.avro. generic .GenericDatumWriter.writeRecord(GenericDatumWriter.java:89) at org.apache.avro. generic .GenericDatumWriter.write(GenericDatumWriter.java:62) at org.apache.avro. generic .GenericDatumWriter.writeRecord(GenericDatumWriter.java:89) at org.apache.avro. generic .GenericDatumWriter.write(GenericDatumWriter.java:62) at org.apache.avro. generic .GenericDatumWriter.write(GenericDatumWriter.java:55) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245) at org.apache.gora.avro.store.DataFileAvroStore.put(DataFileAvroStore.java:54) at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:60) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:185) at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:85) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249) I think I might take this one over to user@avro as I don't know enough about the 1.3.3 codebase and I have no immediate thoughts on this one other than the few optimistic attempts I tried to get things working. Even if someone could confirm that the above stack is spewed when injecting would allow us to confirm that this behavior is consistent.
          Hide
          Alfonso Nishikawa added a comment -

          Hi! Checked an uploaded a new webpage.avsc.
          My fault, now I noticed about "signature" and "prevSignature" being of type "bytes" so must be nullable. Fixed.

          In unions I wrote "null" at lefthand, since the default value in unions correspond to the first schema in the union. However, I think Gora does not care this at this moment...

          Hope this works! (I'm expectant =)

          Show
          Alfonso Nishikawa added a comment - Hi! Checked an uploaded a new webpage.avsc. My fault, now I noticed about "signature" and "prevSignature" being of type "bytes" so must be nullable. Fixed. In unions I wrote "null" at lefthand, since the default value in unions correspond to the first schema in the union. However, I think Gora does not care this at this moment... Hope this works! (I'm expectant =)
          Hide
          Lewis John McGibbney added a comment -

          New avsc, if we can confirm this specifies the correct fields and types then I have everything set up to test this one. This will hopefully enable us to move closer in solving all three linked issues.

          Show
          Lewis John McGibbney added a comment - New avsc, if we can confirm this specifies the correct fields and types then I have everything set up to test this one. This will hopefully enable us to move closer in solving all three linked issues.
          Hide
          Julien Nioche added a comment -

          Hi Alfonso.
          That's right. I must have missed it when writing the modified schema.

          Show
          Julien Nioche added a comment - Hi Alfonso. That's right. I must have missed it when writing the modified schema.
          Hide
          Alfonso Nishikawa added a comment -

          Hi. I have checked the uploaded webpage.avsc. "Content" of type "bytes" is optional? I think that field gets filled in Fetch phase, so would throw a NPE in Inject. "Content" type should be ["null","bytes"]. Am I right?

          Show
          Alfonso Nishikawa added a comment - Hi. I have checked the uploaded webpage.avsc. "Content" of type "bytes" is optional? I think that field gets filled in Fetch phase, so would throw a NPE in Inject. "Content" type should be ["null","bytes"] . Am I right?
          Hide
          Julien Nioche added a comment -

          Hi Lewis

          Do you suggest we update the patch in NUTCH-842 with the correct package name for Gora in the Nutch build.xml file and remove the ant compile-avro-schema target?

          yes, until someone can explain what that target is useful for?

          If no accessors are generated then is this not a problem with the Gora compiler? If so we should open a ticket over there and link the issues.

          it is indeed. Looks like the gora compiler can't deal with the ["string", "null"] union. Will create an issue in GORA land

          Show
          Julien Nioche added a comment - Hi Lewis Do you suggest we update the patch in NUTCH-842 with the correct package name for Gora in the Nutch build.xml file and remove the ant compile-avro-schema target? yes, until someone can explain what that target is useful for? If no accessors are generated then is this not a problem with the Gora compiler? If so we should open a ticket over there and link the issues. it is indeed. Looks like the gora compiler can't deal with the ["string", "null"] union. Will create an issue in GORA land
          Hide
          Lewis John McGibbney added a comment -

          Hi Julien, can you confirm a few things for me please...

          • Do you suggest we update the patch in NUTCH-842 with the correct package name for Gora in the Nutch build.xml file and remove the
             ant compile-avro-schema 

            target?

          • If no accessors are generated then is this not a problem with the Gora compiler? If so we should open a ticket over there and link the issues.
          Show
          Lewis John McGibbney added a comment - Hi Julien, can you confirm a few things for me please... Do you suggest we update the patch in NUTCH-842 with the correct package name for Gora in the Nutch build.xml file and remove the ant compile-avro-schema target? If no accessors are generated then is this not a problem with the Gora compiler? If so we should open a ticket over there and link the issues.
          Hide
          Julien Nioche added a comment -

          required to generate the java code using GORA

          Show
          Julien Nioche added a comment - required to generate the java code using GORA
          Hide
          Julien Nioche added a comment -

          Found a clue in https://issues.apache.org/jira/browse/NUTCH-842. Not sure what the point of compile-avro-schema is but we need to compile the schemas with gora and not just avro. The generated classes now compile fine.

          Using the modified schema fails at compilation as the generated objects don't have accessors e.g. getContentType()

          Show
          Julien Nioche added a comment - Found a clue in https://issues.apache.org/jira/browse/NUTCH-842 . Not sure what the point of compile-avro-schema is but we need to compile the schemas with gora and not just avro. The generated classes now compile fine. Using the modified schema fails at compilation as the generated objects don't have accessors e.g. getContentType()
          Hide
          Julien Nioche added a comment -

          I found in http://mail-archives.apache.org/mod_mbox/avro-user/200910.mbox/%3C4AE78503.50307@apache.org%3E that we probably need to explicitly allow for null values in the schema (see attachment).

          I tried recompiling the schemas with ant compile-avro-schema but the classes generated do not compile and are nowhere near as complete as the original ones. More worryingly the same is true with the original schema. I assumed that the code in org.apache.nutch.storage could be generated from the schemas.

          Any idea?

          Show
          Julien Nioche added a comment - I found in http://mail-archives.apache.org/mod_mbox/avro-user/200910.mbox/%3C4AE78503.50307@apache.org%3E that we probably need to explicitly allow for null values in the schema (see attachment). I tried recompiling the schemas with ant compile-avro-schema but the classes generated do not compile and are nowhere near as complete as the original ones. More worryingly the same is true with the original schema. I assumed that the code in org.apache.nutch.storage could be generated from the schemas. Any idea?
          Hide
          Julien Nioche added a comment -

          Modified avro schema which allows fields to be null

          Show
          Julien Nioche added a comment - Modified avro schema which allows fields to be null
          Hide
          Julien Nioche added a comment -

          Thanks Mike. I confirm the issue.
          Did you recompile the Webpage class from the AVRO defs when using the latest version of AVRO? Could be an incompatibility between the versions.
          Going back to the original problem I don't think the problem comes from AVRO as we would have it with the other backends as well. As for the MemStore I don't think it is used for anything else than tests.

          Show
          Julien Nioche added a comment - Thanks Mike. I confirm the issue. Did you recompile the Webpage class from the AVRO defs when using the latest version of AVRO? Could be an incompatibility between the versions. Going back to the original problem I don't think the problem comes from AVRO as we would have it with the other backends as well. As for the MemStore I don't think it is used for anything else than tests.
          Hide
          Mike Baranczak added a comment -

          I tried upgrading the Avro library to the latest (1.7.2), but I just get another error:

          org.apache.gora.util.GoraException: org.apache.avro.AvroRuntimeException: Not a Specific class: class org.apache.nutch.storage.WebPage
          at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
          at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
          at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
          at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214)
          at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228)
          at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
          at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258)
          Caused by: org.apache.avro.AvroRuntimeException: Not a Specific class: class org.apache.nutch.storage.WebPage
          at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:213)
          at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:154)
          at org.apache.avro.specific.SpecificDatumReader.setSchema(SpecificDatumReader.java:62)
          at org.apache.gora.avro.PersistentDatumReader.setSchema(PersistentDatumReader.java:69)
          at org.apache.gora.avro.PersistentDatumReader.<init>(PersistentDatumReader.java:63)
          at org.apache.gora.store.impl.DataStoreBase.initialize(DataStoreBase.java:87)
          at org.apache.gora.store.impl.FileBackedDataStoreBase.initialize(FileBackedDataStoreBase.java:63)
          at org.apache.gora.avro.store.AvroStore.initialize(AvroStore.java:80)
          at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
          at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)

          Show
          Mike Baranczak added a comment - I tried upgrading the Avro library to the latest (1.7.2), but I just get another error: org.apache.gora.util.GoraException: org.apache.avro.AvroRuntimeException: Not a Specific class: class org.apache.nutch.storage.WebPage at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214) at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:228) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:248) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:258) Caused by: org.apache.avro.AvroRuntimeException: Not a Specific class: class org.apache.nutch.storage.WebPage at org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:213) at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:154) at org.apache.avro.specific.SpecificDatumReader.setSchema(SpecificDatumReader.java:62) at org.apache.gora.avro.PersistentDatumReader.setSchema(PersistentDatumReader.java:69) at org.apache.gora.avro.PersistentDatumReader.<init>(PersistentDatumReader.java:63) at org.apache.gora.store.impl.DataStoreBase.initialize(DataStoreBase.java:87) at org.apache.gora.store.impl.FileBackedDataStoreBase.initialize(FileBackedDataStoreBase.java:63) at org.apache.gora.avro.store.AvroStore.initialize(AvroStore.java:80) at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)

            People

            • Assignee:
              Julien Nioche
              Reporter:
              Mike Baranczak
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development