Nutch
  1. Nutch
  2. NUTCH-1473

Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 2.1
    • Fix Version/s: 2.3
    • Component/s: None
    • Labels:
      None

      Description

      Exception in thread "main" org.apache.gora.util.GoraException: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead
      at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
      at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
      at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
      at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214)
      at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:62)
      at org.apache.nutch.crawl.Crawler.run(Crawler.java:133)
      at org.apache.nutch.crawl.Crawler.run(Crawler.java:246)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.nutch.crawl.Crawler.main(Crawler.java:253)
      Caused by: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead
      at org.apache.gora.sql.store.SqlStore.createSchema(SqlStore.java:226)
      at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:172)
      at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
      at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
      ... 8 more
      Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
      at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
      at com.mysql.jdbc.Util.getInstance(Util.java:386)
      at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
      at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
      at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
      at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
      at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
      at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625)
      at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
      at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2415)
      at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2333)
      at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2318)
      at org.apache.gora.sql.store.SqlStore.createSchema(SqlStore.java:224)
      ... 11 more

        Issue Links

          Activity

          Hide
          Lewis John McGibbney added a comment -

          This won't be touched and should be closed as such.

          Show
          Lewis John McGibbney added a comment - This won't be touched and should be closed as such.
          Hide
          Brian added a comment -

          The only way I could get rid of this error and subsequent errors was to add jdbc-type="text" to the text field (as in the following patch) and change all varchars to have length of 191 in gora-sql-mapping.xml (the 767 value wouldn't work for me even after specifying the innodb_large_prefix options in my.cnf)

          This patch that pointed to use the "jdbc-type" attribute:
          https://issues.apache.org/jira/browse/NUTCH-1497

          Show
          Brian added a comment - The only way I could get rid of this error and subsequent errors was to add jdbc-type="text" to the text field (as in the following patch) and change all varchars to have length of 191 in gora-sql-mapping.xml (the 767 value wouldn't work for me even after specifying the innodb_large_prefix options in my.cnf) This patch that pointed to use the "jdbc-type" attribute: https://issues.apache.org/jira/browse/NUTCH-1497
          Hide
          Nathan Gass added a comment -

          I opened a new issue NUTCH-1490.

          TEXT seems to be the correct jdbc-type (not length).

          I was wrong that this has anything todo with actual inserts of large pages, as this error already happens during table creation (I confused this issue with the issues in NUTCH-1490). It can very well be that the issue is only when using utf8 as the given length is in characters but the maximal length of column types in mysql are in bytes (AFAIK).

          Show
          Nathan Gass added a comment - I opened a new issue NUTCH-1490 . TEXT seems to be the correct jdbc-type (not length). I was wrong that this has anything todo with actual inserts of large pages, as this error already happens during table creation (I confused this issue with the issues in NUTCH-1490 ). It can very well be that the issue is only when using utf8 as the given length is in characters but the maximal length of column types in mysql are in bytes (AFAIK).
          Hide
          Lewis John McGibbney added a comment -

          Hi Nathan, this is a really helpful insight and any progress which can be made towards providing a more application ready gora-sql-mapping.xml file is very much welcome. If it is possible for you to open another issue for all issues you encountered with mis/incorrectly configured column definitions then we can patch them all at once. What would also help is if you were able to share the Exception traces you were encountering before reconfiguring the column definitions.

          On this issue... It appears that "TEXT" is the most appropriate column length here?

          Show
          Lewis John McGibbney added a comment - Hi Nathan, this is a really helpful insight and any progress which can be made towards providing a more application ready gora-sql-mapping.xml file is very much welcome. If it is possible for you to open another issue for all issues you encountered with mis/incorrectly configured column definitions then we can patch them all at once. What would also help is if you were able to share the Exception traces you were encountering before reconfiguring the column definitions. On this issue... It appears that "TEXT" is the most appropriate column length here?
          Hide
          Nathan Gass added a comment -

          Yes this happens because of large pages. The appropriate type was TEXT for me, as I got UTF8 issues after indexing to solr with BLOB type. Our mysql server uses character set utf8.

          There are other columns where nutch does not ensure the data is small enough (or does not reserve enough space in gora-sql-mapping.xml), which is always a problem at least when using mysql. Should I mention them here or open separate issues?

          Show
          Nathan Gass added a comment - Yes this happens because of large pages. The appropriate type was TEXT for me, as I got UTF8 issues after indexing to solr with BLOB type. Our mysql server uses character set utf8. There are other columns where nutch does not ensure the data is small enough (or does not reserve enough space in gora-sql-mapping.xml), which is always a problem at least when using mysql. Should I mention them here or open separate issues?
          Hide
          ZhaiXuepan added a comment -

          Hi,Lewis.This error is generated when it is created in the table.Log is "2012-09-28 21:44:24,512 INFO store.SqlStore - creating schema: webpage",My Database encoding is utf-8.Crawl Chinese HTML page.

          Show
          ZhaiXuepan added a comment - Hi,Lewis.This error is generated when it is created in the table.Log is "2012-09-28 21:44:24,512 INFO store.SqlStore - creating schema: webpage",My Database encoding is utf-8.Crawl Chinese HTML page.
          Hide
          Lewis John McGibbney added a comment -

          Hi zhaixuepan. Yes I understand that it needs to be applied directly to the table structure e.g. gora-sql-mapping.xml, however what are your thoughts regarding either TEXT or BLOB for the column length? Also can you explain exactly how you encountered the above stack trace? Was this a large html page?

          Show
          Lewis John McGibbney added a comment - Hi zhaixuepan. Yes I understand that it needs to be applied directly to the table structure e.g. gora-sql-mapping.xml, however what are your thoughts regarding either TEXT or BLOB for the column length? Also can you explain exactly how you encountered the above stack trace? Was this a large html page?
          Hide
          ZhaiXuepan added a comment -

          Hi,Lewis.Directly in the table structure changes?

          Show
          ZhaiXuepan added a comment - Hi,Lewis.Directly in the table structure changes?
          Hide
          Lewis John McGibbney added a comment -

          Hi zhaixuepan. Do you have any suggestions if either BLOB or TEXT would be more appropriate?

          Show
          Lewis John McGibbney added a comment - Hi zhaixuepan. Do you have any suggestions if either BLOB or TEXT would be more appropriate?

            People

            • Assignee:
              Unassigned
              Reporter:
              ZhaiXuepan
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development