Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-809

TableScan does not support large/infinite scans

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: core
    • Labels:
      None

      Description

      When running a simple query (e.g. select stream * from orders) and the StreamableTable returning a Enumerable<Object[]> backed by an infinite stream, you end up quickly trying to store all the values of the stream in a ListSink in a TableScanNode.

      From Julian:

      You're hitting the interpreter "cheap and dirty"
      implementation of TableScan. I made the interpreter the simplest thing
      that could possibly work, so I made every operator build a list. (I
      know, I know. Enumerable uses iterators, and other implementations do
      even better. But I wanted to fit it into one page of code.)

      ...

      The solution will be either to fix the interpreter to use iterators
      (or similar) rather than lists, or to recognize that a query is
      infinite and not use the interpreter.

      1. calcite-803-v1.patch
        24 kB
        Jesse Yates
      2. calcite-803-v0.patch
        20 kB
        Jesse Yates

        Issue Links

          Activity

          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Resolved in release 1.5.0 (2015-11-10)

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.5.0 (2015-11-10)
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/30d618d5 and http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/a3da6915 and http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/5e7d457a . Thanks for the patch, Jesse Yates !
          Hide
          julianhyde Julian Hyde added a comment -
          Show
          julianhyde Julian Hyde added a comment - Jesse Yates , Please review https://github.com/julianhyde/incubator-calcite/commit/b376d24ccdbcfdefe73e3c1cf6ca0d2011fba8c7 . I think it is a further improvement.
          Hide
          julianhyde Julian Hyde added a comment -

          I think your patch is a big improvement, so I have pushed it to my branch https://github.com/julianhyde/incubator-calcite/tree/new-master and I will merge to apache/master as soon as we re-open after release 1.4.

          However, the method Sink.setSourceEnumerable doesn't seem right. I want to remove it ASAP.

          I wonder whether, rather than creating EnumerableProxySink, we should create a different kind of Source when we are reading from a table (especially a table that happens to be infinite, i.e. a stream).

          Also, the Interpreter should have an execution mode that runs each operator in a thread, rather than assuming that each Node.run() method will run to completion in a reasonable (or even finite) time.

          Show
          julianhyde Julian Hyde added a comment - I think your patch is a big improvement, so I have pushed it to my branch https://github.com/julianhyde/incubator-calcite/tree/new-master and I will merge to apache/master as soon as we re-open after release 1.4. However, the method Sink.setSourceEnumerable doesn't seem right. I want to remove it ASAP. I wonder whether, rather than creating EnumerableProxySink, we should create a different kind of Source when we are reading from a table (especially a table that happens to be infinite, i.e. a stream). Also, the Interpreter should have an execution mode that runs each operator in a thread, rather than assuming that each Node.run() method will run to completion in a reasonable (or even finite) time.
          Hide
          jesse_yates Jesse Yates added a comment -

          Cool, was just going for feedback and a basic test pass on a stream.

          Attaching patch rebased on latest master that fixes tests for me, locally.. well aside from:

          • getting an unexpected error in JdbcFrontJdbcBackTest that goes away when run independently... can't really explain that one.
          • having to change StreamTest plans to expect a LogicalTableScan, rather than an EnumerableTableScan, which wasn't necessary until rebasing

          thoughts? thanks for taking a look Julian Hyde

          Show
          jesse_yates Jesse Yates added a comment - Cool, was just going for feedback and a basic test pass on a stream. Attaching patch rebased on latest master that fixes tests for me, locally.. well aside from: getting an unexpected error in JdbcFrontJdbcBackTest that goes away when run independently... can't really explain that one. having to change StreamTest plans to expect a LogicalTableScan, rather than an EnumerableTableScan, which wasn't necessary until rebasing thoughts? thanks for taking a look Julian Hyde
          Hide
          julianhyde Julian Hyde added a comment -

          I like the general approach – however, your patch gives NPEs when I run InterpreterTest. I haven't tried any other tests.

          Show
          julianhyde Julian Hyde added a comment - I like the general approach – however, your patch gives NPEs when I run InterpreterTest. I haven't tried any other tests.
          Hide
          jesse_yates Jesse Yates added a comment -

          Attaching simple, quick and dirty patch. However, does fix the added test for an infinitely backed stream.

          Uses an enumerable backed source/sink from the Interpreter only when supporting a TableNode. Otherwise, uses a List backed implementation. Modified the list implementation to leverage some of the enumeration code since an enumerable hook had to be exposed in the source anyways.

          This is just a simple version - it doesn't mess around with other RelNode implementations. As such, if we wanted to improve the other RelNodes, we could probably do away with the ListSource/Sink entirely, and replace the logic with enumerables (at least from glancing at things like JoinNode)

          Show
          jesse_yates Jesse Yates added a comment - Attaching simple, quick and dirty patch. However, does fix the added test for an infinitely backed stream. Uses an enumerable backed source/sink from the Interpreter only when supporting a TableNode. Otherwise, uses a List backed implementation. Modified the list implementation to leverage some of the enumeration code since an enumerable hook had to be exposed in the source anyways. This is just a simple version - it doesn't mess around with other RelNode implementations. As such, if we wanted to improve the other RelNodes, we could probably do away with the ListSource/Sink entirely, and replace the logic with enumerables (at least from glancing at things like JoinNode)

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              jesse_yates Jesse Yates
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development