Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17005

[Java] Incorrect results from JDBC Adapter from Postgres of non-nullable column through left join

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 9.0.0
    • Java

    Description

      Unsure to consider this a bug or wish, but the JDBC to Arrow Adapter produces incorrect results when wrapping the postgres driver in certain cases. 

      If you left join a non-nullable column, the column becomes nullable (if the join condition does not match any columns). However the postgres ResultSetMetaData lies to you and still indicates that the column is still non-nullable. 

      When iterating through the data, results come back as null (isNull will return true). 

      However, because of the way that the JDBCConsumer is created, it creates a non-nullable consumer and will not check the nullability of these results. 

      Unfortunately, this results in incorrect data or errors depending on the data types returned. 

      The postgres JDBC team has closed a ticket about this indicating that it would be impossible for them to return the correct data nullability data to the JDBC driver. see: https://github.com/pgjdbc/pgjdbc/issues/2079

      An example: 

      Table: 

      t1.id
      2
      3
      CREATE TABLE t1 (id integer NOT NULL);
      INSERT INTO t1 VALUES (2), (3);
      

      Query

      WITH t2 AS (SELECT 1 AS id UNION SELECT 2)
      SELECT 
        t1.id 
      FROM t2 
      LEFT JOIN t1 on t1.id = t2.id;

      This returns the result set:

      id
      2
      null

      The ResultSetMetaData indicates that the column is non-nullable (as t1.id is non-nullable) but there is null data in the result. 

      The Arrow Vector that is present after the result set is consumed, looks like this: 

      id
      2
      0

      ResultSet.getInt(1) will return 0 when the source data is null, with an expectation that you check isNull. 

      The data is incorrect and silently fails potentially leading to clients / consumers getting bad data. 

       

      In other cases, such as UUID (mapped to UTF-8 vectors) the value will fail to load into arrow due to expecting null data and throwing a NPE when deserializing / converting to bytearrays. 

       

      I was able to work around this problem by wrapping the postgres JDBC ResultSetMetadata and always forcing the nullability to nullable (or nullability unknown). 

      Unfortunately I don't think there is a great way to solve this, but perhaps some way to configure / override the JDBCConsumer creation would allow for users of this library to override this behavior, however the silent failure and incorrect data might lead to users not noticing. 

      Attachments

        Issue Links

          Activity

            People

              lidavidm David Li
              jswenson Jonathan Swenson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m