Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-70

Joins seem to be very expensive in memory

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      I guess it's a feature that joins are only done in memory at moment?

      Either way I couldnt actually get optiq to join my two 10M row files.

      Curiously the error is in the abstract string builder, when i'm joining 2 int columns and returning a count, so not sure what strings are being dealt with here exactly.

      At first i ran sqlline as-is, and it bombed as below. Not sure what java does when you dont specify -Xmx at all. I then tried again with Xmx4096M and errm it about killed my laptop before i was able to kill it

      ```
      sqlline> !connect jdbc:optiq:model=target/test-classes/model.json admin admin
      0: jdbc:optiq:model=target/test-classes/model> select count from tdata
      . . . . . . . . . . . . . . . . . . . . . . .> join tdata2 on tdata2."id" = tdata."id"
      . . . . . . . . . . . . . . . . . . . . . . .> ;
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:62)
      at java.lang.StringBuilder.<init>(StringBuilder.java:97)
      at au.com.bytecode.opencsv.CSVParser.parseLine(CSVParser.java:206)
      at au.com.bytecode.opencsv.CSVParser.parseLineMulti(CSVParser.java:174)
      at au.com.bytecode.opencsv.CSVReader.readNext(CSVReader.java:237)
      at net.hydromatic.optiq.impl.csv.CsvEnumerator.moveNext(CsvEnumerator.java:54)
      at net.hydromatic.linq4j.EnumerableDefaults.toLookup_(EnumerableDefaults.java:1826)
      at net.hydromatic.linq4j.EnumerableDefaults.toLookup(EnumerableDefaults.java:1817)
      at net.hydromatic.linq4j.EnumerableDefaults.toLookup(EnumerableDefaults.java:1793)
      at net.hydromatic.linq4j.DefaultEnumerable.toLookup(DefaultEnumerable.java:663)
      at net.hydromatic.linq4j.EnumerableDefaults$4.enumerator(EnumerableDefaults.java:797)
      at Baz$4$1.<init>(Unknown Source)
      at Baz$4.enumerator(Unknown Source)
      at net.hydromatic.linq4j.EnumerableDefaults.aggregate(EnumerableDefaults.java:82)
      at net.hydromatic.linq4j.DefaultEnumerable.aggregate(DefaultEnumerable.java:88)
      at Baz.bind(Unknown Source)
      at net.hydromatic.optiq.jdbc.OptiqPrepare$PrepareResult.getEnumerable(OptiqPrepare.java:193)
      at net.hydromatic.optiq.jdbc.OptiqPrepare$PrepareResult.enumerator(OptiqPrepare.java:203)
      at net.hydromatic.optiq.jdbc.OptiqStatement$1.apply(OptiqStatement.java:378)
      at net.hydromatic.optiq.jdbc.OptiqStatement$1.apply(OptiqStatement.java:376)
      at net.hydromatic.optiq.jdbc.OptiqResultSet.execute(OptiqResultSet.java:157)
      at net.hydromatic.optiq.jdbc.OptiqStatement.executeQueryInternal(OptiqStatement.java:366)
      at net.hydromatic.optiq.jdbc.OptiqStatement.executeInternal(OptiqStatement.java:333)
      at net.hydromatic.optiq.jdbc.OptiqStatement.execute(OptiqStatement.java:195)
      at sqlline.SqlLine$Commands.execute(SqlLine.java:3700)
      at sqlline.SqlLine$Commands.sql(SqlLine.java:3608)
      at sqlline.SqlLine.dispatch(SqlLine.java:845)
      at sqlline.SqlLine.begin(SqlLine.java:721)
      at sqlline.SqlLine.start(SqlLine.java:462)
      at sqlline.SqlLine.main(SqlLine.java:430)
      0: jdbc:optiq:model=target/test-classes/model> !quit
      Closing: net.hydromatic.optiq.jdbc.FactoryJdbc41$OptiqConnectionJdbc41

      ```

      ---------------- Imported from GitHub ----------------
      Url: https://github.com/julianhyde/optiq/issues/70
      Created by: codek
      Labels:
      Created at: Mon Oct 28 17:48:59 CET 2013
      State: closed

        Activity

        Hide
        github-import GitHub Import added a comment -

        [Date: Sat Nov 02 00:16:11 CET 2013, Author: julianhyde]

        The code generated by Optiq for in-memory queries can be improved.

        There might be something like a cartesian product going on here. The aggregate function (used to implement GROUP BY) is probably forming a collection, even though in this case it could just iterate over the rows as they come out.

        Show
        github-import GitHub Import added a comment - [Date: Sat Nov 02 00:16:11 CET 2013, Author: julianhyde ] The code generated by Optiq for in-memory queries can be improved. There might be something like a cartesian product going on here. The aggregate function (used to implement GROUP BY) is probably forming a collection, even though in this case it could just iterate over the rows as they come out.
        Hide
        github-import GitHub Import added a comment -
        Show
        github-import GitHub Import added a comment - [Date: Tue Dec 03 22:08:30 CET 2013, Author: julianhyde ] Fixed in https://github.com/julianhyde/optiq/commit/c20b25bf8a7cfb309d0d7f161a91683c86fc8099 .

          People

          • Assignee:
            Unassigned
            Reporter:
            github-import GitHub Import
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development