Uploaded image for project: 'Jackrabbit Content Repository'
  1. Jackrabbit Content Repository
  2. JCR-2892

Large fetch sizes have potentially deleterious effects on VM memory requirements when using Oracle

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.2
    • Fix Version/s: 2.2.10, 2.3.2
    • Component/s: jackrabbit-core, sql
    • Labels:
      None
    • Environment:
      Oracle 10g+

      Description

      Since Release 10g, Oracle JDBC drivers use the fetch size to allocate buffers for caching row data.
      cf. http://www.oracle.com/technetwork/database/enterprise-edition/memory.pdf

      r1060431 hard-codes the fetch size for all ResultSet-returning statements to 10,000. This value has significant, potentially deleterious, effects on the heap space required for even moderately-sized repositories. For example, the BUNDLE table (from 'oracle.ddl') has two columns – NODE_ID raw(16) and BUNDLE_DATA blob – which require 16 b and 4 kb of buffer space, respectively. This requires a buffer of more than 40 mb [(16+4096) * 10000 = 41120000].

      If the issue described in JCR-2832 is truly specific to PostgreSQL, I think its resolution should be moved to a PostgreSQL-specific ConnectionHelper subclass. Failing that, there should be a way to override this hard-coded value in OracleConnectionHelper.

      1. JCR-2892.patch
        6 kB
        Claus Köll
      2. oracleFetchSize.patch
        2 kB
        Danilo Ghirardelli

        Issue Links

          Activity

          Hide
          tmueller Thomas Mueller added a comment -

          Do you have a reproducible test case where it uses a lot of memory or even runs out of memory?

          As far as I see, the persistence manager only ever selects one row at a time (using a where condition), except for the consistency check, where only the node id is selected. Therefore, I don't see where it could use a lot of memory.

          Please re-open if you have a test case where it's actually a problem, or if you can point to the code where many complete rows are selected.

          Show
          tmueller Thomas Mueller added a comment - Do you have a reproducible test case where it uses a lot of memory or even runs out of memory? As far as I see, the persistence manager only ever selects one row at a time (using a where condition), except for the consistency check, where only the node id is selected. Therefore, I don't see where it could use a lot of memory. Please re-open if you have a test case where it's actually a problem, or if you can point to the code where many complete rows are selected.
          Hide
          celkins Christopher Elkins added a comment -

          I do have a reproducible case in my application, but it is not something I can easily reduce to a test case suitable for attachment here.

          The actual number of rows returned is irrelevant. From the aforelinked PDF,

          "Since the buffers are allocated when the SQL is parsed, the size of the buffers depends not on the actual size of the row data returned by the query, but on the maximum size possible for the row data. After the SQL is parsed, the type of every column is known and from that information the driver can compute the maximum amount of memory required to store each column. The driver also has the fetchSize, the number of rows to retrieve on each fetch. With the size of each column and the number of rows, the driver can compute the absolute maximum size of the data returned in a single fetch. That is the size of the buffers."

          The specific context is the same as that of JCR-2832, adding a new node to a cluster. However, any statement that returns a BLOB column given the current fetch size of 10,000 is going to create large buffers.

          Show
          celkins Christopher Elkins added a comment - I do have a reproducible case in my application, but it is not something I can easily reduce to a test case suitable for attachment here. The actual number of rows returned is irrelevant. From the aforelinked PDF, "Since the buffers are allocated when the SQL is parsed, the size of the buffers depends not on the actual size of the row data returned by the query, but on the maximum size possible for the row data. After the SQL is parsed, the type of every column is known and from that information the driver can compute the maximum amount of memory required to store each column. The driver also has the fetchSize, the number of rows to retrieve on each fetch. With the size of each column and the number of rows, the driver can compute the absolute maximum size of the data returned in a single fetch. That is the size of the buffers." The specific context is the same as that of JCR-2832 , adding a new node to a cluster. However, any statement that returns a BLOB column given the current fetch size of 10,000 is going to create large buffers.
          Hide
          tmueller Thomas Mueller added a comment -

          Wow, I would never have thought the Oracle driver actually allocates the memory... sounds like a really bad idea to do nowadays, but well this is Oracle. Anyway, if you do have a failing test case, then of course the fetch size needs to be changed.

          Show
          tmueller Thomas Mueller added a comment - Wow, I would never have thought the Oracle driver actually allocates the memory... sounds like a really bad idea to do nowadays, but well this is Oracle. Anyway, if you do have a failing test case, then of course the fetch size needs to be changed.
          Hide
          dfghi Danilo Ghirardelli added a comment -

          I have the same problem, 10000 in fetch size is causing out of memory, even jackrabbit version 2.2.8.
          In my case it seems to be triggered by clustering configuration, because probably that will return more than a single row, and in this case oracle will allocate all the necessary ram for about 10000 rows.

          You can try with Oracle 10 (XE edition also causes the problem), driver 10.2.0.4 (or 10.2.0.5), default java xms size (32 bit version, not 64, on Windows platform) and clustering configured. The simple existence of clustering causes the crash in my case, you may want to save a few nodes to force something in the clustering tables.

          Anyway, that size of fetch size is dangerous. If it was set only to limit postgresql allocation and you are sure that the average query will return a single record, you should hardcode it to 10... I tried that the application will start correctly with a value of 10 and 100 but not with 1000 or above.

          Show
          dfghi Danilo Ghirardelli added a comment - I have the same problem, 10000 in fetch size is causing out of memory, even jackrabbit version 2.2.8. In my case it seems to be triggered by clustering configuration, because probably that will return more than a single row, and in this case oracle will allocate all the necessary ram for about 10000 rows. You can try with Oracle 10 (XE edition also causes the problem), driver 10.2.0.4 (or 10.2.0.5), default java xms size (32 bit version, not 64, on Windows platform) and clustering configured. The simple existence of clustering causes the crash in my case, you may want to save a few nodes to force something in the clustering tables. Anyway, that size of fetch size is dangerous. If it was set only to limit postgresql allocation and you are sure that the average query will return a single record, you should hardcode it to 10... I tried that the application will start correctly with a value of 10 and 100 but not with 1000 or above.
          Hide
          chesterm8 Stephen Chester added a comment -

          I am also experiencing this issue, and am working on a patch that will add the ability to specify the fetch size to the configuration options of the relevant classes, with the default being 10,000.

          Show
          chesterm8 Stephen Chester added a comment - I am also experiencing this issue, and am working on a patch that will add the ability to specify the fetch size to the configuration options of the relevant classes, with the default being 10,000.
          Hide
          manuel.molaschi Manuel Molaschi added a comment -

          We have a different problem with the same cause... We experience a big performance loss on an app built on Magnolia CMS and starting on Tomcat (configured with xmx at 1024m)
          After a long investigation we assumed that this behaviour was caused by jvm spending a lot of time in GC (cpu usage was always very high)

          These are the startup time on different jr versions / fetch size

          with jr 2.2.1: 129 sec
          with jr 2.2.8 (fetch size 10000): > 1h
          with jr 2.2.8 (patch applied on ConnectionHelper to comment out stmt.setFetchSize): 135 sec

          Show
          manuel.molaschi Manuel Molaschi added a comment - We have a different problem with the same cause... We experience a big performance loss on an app built on Magnolia CMS and starting on Tomcat (configured with xmx at 1024m) After a long investigation we assumed that this behaviour was caused by jvm spending a lot of time in GC (cpu usage was always very high) These are the startup time on different jr versions / fetch size with jr 2.2.1: 129 sec with jr 2.2.8 (fetch size 10000): > 1h with jr 2.2.8 (patch applied on ConnectionHelper to comment out stmt.setFetchSize): 135 sec
          Hide
          dfghi Danilo Ghirardelli added a comment -

          I attached a very minimalistic patch that will set the fetchSize to 10 in the OracleConnectionHelper, that is used for all oracle connections (bundles, journal and datastore).

          But still there are a few points that may be considered:

          • as stated by the original poster, if the fetchSize problem was Postgresql specific, it would be better to create a PostgresConnectionHelper and isolate the different fetchSize there.
          • generally speaking, if you are sure that the ResultSet is not going to be very big, you should really trim down that fetchSize (or don't set it so jdbc drivers would use their defaults), otherwise a simple change in some other driver behaviour might cause out of memory or slowness.
          • this issue is currently unassigned, so nobody would apply any patch...
          Show
          dfghi Danilo Ghirardelli added a comment - I attached a very minimalistic patch that will set the fetchSize to 10 in the OracleConnectionHelper, that is used for all oracle connections (bundles, journal and datastore). But still there are a few points that may be considered: as stated by the original poster, if the fetchSize problem was Postgresql specific, it would be better to create a PostgresConnectionHelper and isolate the different fetchSize there. generally speaking, if you are sure that the ResultSet is not going to be very big, you should really trim down that fetchSize (or don't set it so jdbc drivers would use their defaults), otherwise a simple change in some other driver behaviour might cause out of memory or slowness. this issue is currently unassigned, so nobody would apply any patch...
          Hide
          c_koell Claus Köll added a comment -

          Soon I'll take a look at this issue

          Show
          c_koell Claus Köll added a comment - Soon I'll take a look at this issue
          Hide
          dfghi Danilo Ghirardelli added a comment -

          The linked issue JCR-3090 addressed the same issue in a different way: every call to the "reallyExec" method (where the setFetchSize is set) has the maxRows value set to 0, so the patch for that issue is simply indirectly disabling the fetch size hint. This would probably solve the problem of this issue but will probably re-open the problem stated in JCR-2832.

          By the way, just like Manuel, I also used Magnolia CMS (with clustering) on Tomcat to reproduce the problem.

          Show
          dfghi Danilo Ghirardelli added a comment - The linked issue JCR-3090 addressed the same issue in a different way: every call to the "reallyExec" method (where the setFetchSize is set) has the maxRows value set to 0, so the patch for that issue is simply indirectly disabling the fetch size hint. This would probably solve the problem of this issue but will probably re-open the problem stated in JCR-2832 . By the way, just like Manuel, I also used Magnolia CMS (with clustering) on Tomcat to reproduce the problem.
          Hide
          c_koell Claus Köll added a comment -

          Created a patch that creates for a PostgreSQLDB a own ConnectionHelper that sets the fetchSize to 10000 to not break any code. On other DB's the fetchSize will be 100.

          Show
          c_koell Claus Köll added a comment - Created a patch that creates for a PostgreSQLDB a own ConnectionHelper that sets the fetchSize to 10000 to not break any code. On other DB's the fetchSize will be 100.
          Hide
          dfghi Danilo Ghirardelli added a comment -

          I'm sorry if I wasn't clear, but the value of 100 just worked for my application (clustered Magnolia CMS), and it was the maximum fetchSize value that allowed the application to start. Anyway, at 100 the memory occupation was quite high, so maybe that value was good in my case but may crash in other cases. Would it be possible to avoid setting a fetchSize at all (except for postgresql)? Performance were always good and there was no problem until it was set for the first time in JCR-2832...

          Show
          dfghi Danilo Ghirardelli added a comment - I'm sorry if I wasn't clear, but the value of 100 just worked for my application (clustered Magnolia CMS), and it was the maximum fetchSize value that allowed the application to start. Anyway, at 100 the memory occupation was quite high, so maybe that value was good in my case but may crash in other cases. Would it be possible to avoid setting a fetchSize at all (except for postgresql)? Performance were always good and there was no problem until it was set for the first time in JCR-2832 ...
          Hide
          c_koell Claus Köll added a comment -

          Hi Danilo,

          Ok i see. I think we can change the fetchSize to 0 per default. This should work as it means the hint should be ignored.

          Show
          c_koell Claus Köll added a comment - Hi Danilo, Ok i see. I think we can change the fetchSize to 0 per default. This should work as it means the hint should be ignored.
          Hide
          c_koell Claus Köll added a comment -

          Modified (fetch Size 0 per default) patch applied in rev 1182667.
          PostreSQL has a fetchSize of 10000 as before. If it makes no problems we can leave it so high.

          Show
          c_koell Claus Köll added a comment - Modified (fetch Size 0 per default) patch applied in rev 1182667. PostreSQL has a fetchSize of 10000 as before. If it makes no problems we can leave it so high.
          Hide
          jukkaz Jukka Zitting added a comment -

          Merged to the 2.2 branch in revision 1202191.

          Show
          jukkaz Jukka Zitting added a comment - Merged to the 2.2 branch in revision 1202191.

            People

            • Assignee:
              c_koell Claus Köll
              Reporter:
              celkins Christopher Elkins
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development