Apache Gora
  1. Apache Gora
  2. GORA-24

Throwing EOFException with MEDIUMBLOB type for inlinks column

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 0.4
    • Component/s: gora-sql
    • Labels:
      None
    • Environment:
      MySQL

      Description

      I had an exception with DbUpdaterJob complaining that inlinks column of type BLOB in webpage table was not big enough to store all the incoming links. So I changed the column definition in gora-sql-mapping.xml from BLOB to MEDIUMBLOB:

      <field name="inlinks" column="inlinks" jdbc-type="MEDIUMBLOB"/>

      Now I systematically get an exception in the update step:

      java.io.IOException: java.sql.BatchUpdateException: Error reading from InputStream java.io.EOFException
      at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:341)
      at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
      at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
      Caused by: java.sql.BatchUpdateException: Error reading from InputStream java.io.EOFException
      at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2020)
      at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1451)
      at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:329)
      ... 5 more

        Activity

        Hide
        Alexis added a comment -

        The exception that occured with the update command and that motivated the change to mediumblob type, was:

        java.io.IOException: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1
        at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
        at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
        at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
        Caused by: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1
        at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2018)
        at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1449)
        at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
        ... 5 more

        Show
        Alexis added a comment - The exception that occured with the update command and that motivated the change to mediumblob type, was: java.io.IOException: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1 at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340) at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185) at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) Caused by: java.sql.BatchUpdateException: Data truncation: Data too long for column 'inlinks' at row 1 at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2018) at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1449) at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328) ... 5 more
        Hide
        Chris A. Mattmann added a comment -
        • push to 0.3
        Show
        Chris A. Mattmann added a comment - push to 0.3
        Hide
        Nathan Gass added a comment -

        What worked for me is to set a large enough length on the field. The created table then used mediumblob or longblob but there was no exception. (I had the same problem as you with jdbc-type=MEDIUMBLOB).

        Show
        Nathan Gass added a comment - What worked for me is to set a large enough length on the field. The created table then used mediumblob or longblob but there was no exception. (I had the same problem as you with jdbc-type=MEDIUMBLOB).
        Hide
        Lewis John McGibbney added a comment -

        So this is not a bug within Gora so to speak but related to an incorrect/insufficient mapping configuration for a specific use case... is this a fair comment?

        Show
        Lewis John McGibbney added a comment - So this is not a bug within Gora so to speak but related to an incorrect/insufficient mapping configuration for a specific use case... is this a fair comment?
        Hide
        Nathan Gass added a comment -

        Well, as the original submitter I expected jdbc-type="MEDIUMBLOB" to work. But the documentation of gora-sql is too sparse or I did not find the relevant part to say if this expectation is correct.

        So I'd still consider this a valid bug and setting the length a workaround. On the other hand setting the length could well be the better option as it is more general than MEDIUMBLOB and gora-sql can figure out the correct schema for other sql databases as well?

        Show
        Nathan Gass added a comment - Well, as the original submitter I expected jdbc-type="MEDIUMBLOB" to work. But the documentation of gora-sql is too sparse or I did not find the relevant part to say if this expectation is correct. So I'd still consider this a valid bug and setting the length a workaround. On the other hand setting the length could well be the better option as it is more general than MEDIUMBLOB and gora-sql can figure out the correct schema for other sql databases as well?
        Hide
        Renato Javier Marroquín Mogrovejo added a comment -

        Hi Nathan,
        I wanted to fix this, but after reading a little bit about this, and reviewing the code, I think this is not a bug within Gora. Maybe we can provide better error messages, the problem here is that the Nutch's MySQL mapping file is not right for most cases.

        
        

        <class name="org.apache.gora.examples.generated.WebPage" keyClass="java.lang.String" table="WebPage">
        <primarykey column="id" length="128"/>
        <field name="url" column="url" length="128" primarykey="true"/>
        <field name="content" column="content"/>
        <field name="parsedContent" column="parsedContent"/>
        <field name="outlinks" column="outlinks"/>
        <field name="metadata" column="metadata"/>
        </class>

        {/code}

        This is what it looks like within Gora, should we change it?

        Show
        Renato Javier Marroquín Mogrovejo added a comment - Hi Nathan, I wanted to fix this, but after reading a little bit about this, and reviewing the code, I think this is not a bug within Gora. Maybe we can provide better error messages, the problem here is that the Nutch's MySQL mapping file is not right for most cases. <class name="org.apache.gora.examples.generated.WebPage" keyClass="java.lang.String" table="WebPage"> <primarykey column="id" length="128"/> <field name="url" column="url" length="128" primarykey="true"/> <field name="content" column="content"/> <field name="parsedContent" column="parsedContent"/> <field name="outlinks" column="outlinks"/> <field name="metadata" column="metadata"/> </class> {/code} This is what it looks like within Gora, should we change it?
        Hide
        Nathan Gass added a comment -

        The reported bug is about using jdbc-type="MEDIUMBLOB", which is not in the default sql mapping file of Nutch.
        Whether jdbc-type="MEDIUMBLOB" ought to work or not is not my decision, but a better error message and some documentation about supported jdbc-type values would indeed be nice from a user perspective.

        The Data truncation exception in the comment is indeed a problem with Nutch, assuming that gora does not promise to support arbitrary length data when no length is given. There are two issues about this: NUTCH-1490 and NUTCH-1497.

        Show
        Nathan Gass added a comment - The reported bug is about using jdbc-type="MEDIUMBLOB", which is not in the default sql mapping file of Nutch. Whether jdbc-type="MEDIUMBLOB" ought to work or not is not my decision, but a better error message and some documentation about supported jdbc-type values would indeed be nice from a user perspective. The Data truncation exception in the comment is indeed a problem with Nutch, assuming that gora does not promise to support arbitrary length data when no length is given. There are two issues about this: NUTCH-1490 and NUTCH-1497 .
        Hide
        Lewis John McGibbney added a comment -

        As the gora-sql module is now deprecated (due to licensing issues).
        Please correct but my outlook on this one is as follows

        • write support for MEDIUMBLOB into new gora-sql module
        • accompany this with better error handling/message logging and additionally some additional guidance in the gora-sql-mapping.xml file

        There is little we can do about this in Gora until the gora-sql module is written, therefore any problems which are experienced using gora-sql with Nutch 2.x (or any other client applications for that matter) will need to be addressed at that level not within Gora.

        Show
        Lewis John McGibbney added a comment - As the gora-sql module is now deprecated (due to licensing issues). Please correct but my outlook on this one is as follows write support for MEDIUMBLOB into new gora-sql module accompany this with better error handling/message logging and additionally some additional guidance in the gora-sql-mapping.xml file There is little we can do about this in Gora until the gora-sql module is written, therefore any problems which are experienced using gora-sql with Nutch 2.x (or any other client applications for that matter) will need to be addressed at that level not within Gora.
        Hide
        Henry Saputra added a comment -

        Totally forgot about the licensing issue with gora-sql.
        Were we trying to deprecated it to rewrite it using JOOQ? I tried to look at the history of this discussion but could not find the exact discussion about it.

        Show
        Henry Saputra added a comment - Totally forgot about the licensing issue with gora-sql. Were we trying to deprecated it to rewrite it using JOOQ? I tried to look at the history of this discussion but could not find the exact discussion about it.
        Hide
        Lewis John McGibbney added a comment -

        Hi Henry, yes the idea is to use JOOQ as is provides support for a wide variety of SQL stores out of the box... something which would be very appealing to users for obvious reasons.
        There is a separate Jira issue on this topic altogether GORA-86

        Show
        Lewis John McGibbney added a comment - Hi Henry, yes the idea is to use JOOQ as is provides support for a wide variety of SQL stores out of the box... something which would be very appealing to users for obvious reasons. There is a separate Jira issue on this topic altogether GORA-86
        Hide
        Henry Saputra added a comment -

        Ah cool, thanks for the info Lewis

        Show
        Henry Saputra added a comment - Ah cool, thanks for the info Lewis
        Hide
        Henry Saputra added a comment -

        Fix it as wont fix for now until we enable Gora-SQL

        Show
        Henry Saputra added a comment - Fix it as wont fix for now until we enable Gora-SQL

          People

          • Assignee:
            Unassigned
            Reporter:
            Alexis
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development