Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4335

fetch calls may now produce empty row batches

    Details

      Description

      After recent changes (likely IMPALA-2905), Impala may return empty row batches on fetch() calls. This shouldn't be a breaking issue for well written clients, but misbehaving clients may break. (e.g. Impyla isn't handling this correctly, I have a patch for it.)

      Also this results in extra, unnecessary RPCs. We should make sure the server is skipping empty row batches.

      I'm marking this as 'distributed exec' rather than 'client' because I think it's related to the recent coordinator fragment changes.

      This is easy to reproduce with a simple query. Executed through impyla results in 0 rows because the first row batch is empty.

      #!/usr/bin/env impala-python
      
      import impala.dbapi
      with impala.dbapi.connect(host='127.0.0.1', port=21050) as conn:
        with conn.cursor() as cursor:
          cursor.execute('select * from tpch_kudu.region')
          kudu_results = cursor.fetchall()
          for row in kudu_results:
            print row
      

        Activity

        Hide
        henryr Henry Robinson added a comment -

        Hm, the first batch shouldn't be empty. I guess the Impyla bug is that it doesn't check hasMoreRows properly? Even though the contract isn't broken this is still unexpected behavor.

        Do you happen to know if Kudu returns an empty row batch (i.e. does this problem start in the scanner)?

        Show
        henryr Henry Robinson added a comment - Hm, the first batch shouldn't be empty. I guess the Impyla bug is that it doesn't check hasMoreRows properly? Even though the contract isn't broken this is still unexpected behavor. Do you happen to know if Kudu returns an empty row batch (i.e. does this problem start in the scanner)?
        Hide
        mjacobs Matthew Jacobs added a comment -

        Yeah, I'm going to patch Impyla to handle this better, but my concern is other clients could misbehave as well.

        I do think the empty row batches are coming from the kudu scans, specifically from scans where there are tablets with no rows. Since a scanner is still scheduled for those ranges, it produces an empty batch. We could consider being smarter and not returning empty row batches in the KuduScanNode, but in general, row batches with no rows are valid within internal execution since they may be transferring attached resources for previous rows. That of course does not apply when returning batches to external clients.

        Show
        mjacobs Matthew Jacobs added a comment - Yeah, I'm going to patch Impyla to handle this better, but my concern is other clients could misbehave as well. I do think the empty row batches are coming from the kudu scans, specifically from scans where there are tablets with no rows. Since a scanner is still scheduled for those ranges, it produces an empty batch. We could consider being smarter and not returning empty row batches in the KuduScanNode, but in general, row batches with no rows are valid within internal execution since they may be transferring attached resources for previous rows. That of course does not apply when returning batches to external clients.
        Hide
        henryr Henry Robinson added a comment -

        The test query doesn't reproduce 0-row batches (at least in my test environment), but the fix is very straightforward.

        Show
        henryr Henry Robinson added a comment - The test query doesn't reproduce 0-row batches (at least in my test environment), but the fix is very straightforward.
        Show
        henryr Henry Robinson added a comment - Fixed in https://github.com/apache/incubator-impala/commit/48085274fa8ae57453477db21dae0e53eae6b766

          People

          • Assignee:
            henryr Henry Robinson
            Reporter:
            mjacobs Matthew Jacobs
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development