Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1962

Add description for session variables

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0, 0.11.1
    • Component/s: Documentation
    • Labels:
      None

      Description

      Our document (http://tajo.apache.org/docs/devel/tsql/variables.html) only shows the list of session variables. It would be much helpful if we add some description.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user jihoonson opened a pull request:

        https://github.com/apache/tajo/pull/848

        TAJO-1962: Add description for session variables

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/jihoonson/tajo-2 TAJO-1962

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/848.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #848


        commit 204181cba402c6b62fce2e59b9a74e7b2d866af5
        Author: Jihoon Son <jihoonson@apache.org>
        Date: 2015-11-05T08:18:02Z

        Add descriptions of session variables


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user jihoonson opened a pull request: https://github.com/apache/tajo/pull/848 TAJO-1962 : Add description for session variables You can merge this pull request into a Git repository by running: $ git pull https://github.com/jihoonson/tajo-2 TAJO-1962 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/848.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #848 commit 204181cba402c6b62fce2e59b9a74e7b2d866af5 Author: Jihoon Son <jihoonson@apache.org> Date: 2015-11-05T08:18:02Z Add descriptions of session variables
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44100970

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 10000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set AGG_HASH_TABLE_SIZE 10000
            +
            +.. describe:: TIMEZONE
            +
            +Refer to :doc:`/time_zone`.
            +
            + * Property value: Time zone id
            + * Default value: Default time zone of JVM
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TIMEZONE GMT+9
            +
            +.. describe:: DATE_ORDER
            +
            +Date order specification.
            +
            + * Property value: One of YMD, DMY, MDY.
            + * Default value: YMD
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DATE_ORDER YMD
            +
            +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
            +
            +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
            +
            +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
            +
            +In Tajo, storing a partition table is executed in two stages.
            +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: ARITHABORT
            +
            +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ARITHABORT false
            +
            +.. describe:: MAX_OUTPUT_FILE_SIZE
            +
            +Maximum per-output file size. 0 means infinite.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 0
            + * Example
            +
            +.. code-block:: sh
            +
            + \set MAX_OUTPUT_FILE_SIZE 0
            +
            +.. describe:: SESSION_EXPIRY_TIME
            +
            +Session expiry time.
            +
            + * Property value: Integer
            + * Unit: seconds
            + * Default value: 3600
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SESSION_EXPIRY_TIME 3600
            +
            +.. describe:: CLI_COLUMNS
            +
            +Sets the width for the wrapped format.
            +
            + * Property value: Integer
            + * Default value: 120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_COLUMNS 120
            +
            +.. describe:: CLI_NULL_CHAR
            +
            +Sets the string to be printed in place of a null value.
            +
            + * Property value: String
            + * Default value: ''
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_NULL_CHAR ''
            +
            +.. describe:: CLI_PAGE_ROWS
            +
            +Sets the number of rows for paging.
            +
            + * Property value: Integer
            + * Default value: 100
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGE_ROWS 100
            +
            +.. describe:: CLI_PAGING_ENABLED
            +
            +Enable paging of result display.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGING_ENABLED true
            +
            +.. describe:: CLI_DISPLAY_ERROR_TRACE
            +
            +Enable display of error trace.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_DISPLAY_ERROR_TRACE true
            +
            +.. describe:: CLI_FORMATTER_CLASS
            +
            +Sets the output format class to display results.
            +
            + * Property value: Class name
            + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            +
            +.. describe:: ON_ERROR_STOP
            +
            +tsql will exist if an error occurs.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ON_ERROR_STOP false
            +
            +.. describe:: NULL_CHAR
            +
            +Null char of text file output.
            +
            + * Property value: String
            + * Default value: '
            N'
            + * Example
            +
            +.. code-block:: sh
            +
            + \set NULL_CHAR '
            N'
            +
            +.. describe:: DEBUG_ENABLED
            +
            +Debug mode enabled.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DEBUG_ENABLED false
            +
            +.. describe:: FETCH_ROWNUM
            +
            +Sets the number of rows at a time from Master.
            • End diff –

        How about them?

        • the number of rows in a ResultSet for each fetch
        • the number of rows to be fetched from Master
        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44100970 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 10000 + * Example + +.. code-block:: sh + + \set AGG_HASH_TABLE_SIZE 10000 + +.. describe:: TIMEZONE + +Refer to :doc:`/time_zone`. + + * Property value: Time zone id + * Default value: Default time zone of JVM + * Example + +.. code-block:: sh + + \set TIMEZONE GMT+9 + +.. describe:: DATE_ORDER + +Date order specification. + + * Property value: One of YMD, DMY, MDY. + * Default value: YMD + * Example + +.. code-block:: sh + + \set DATE_ORDER YMD + +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED + +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false + +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE + +In Tajo, storing a partition table is executed in two stages. +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256 + +.. describe:: ARITHABORT + +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ARITHABORT false + +.. describe:: MAX_OUTPUT_FILE_SIZE + +Maximum per-output file size. 0 means infinite. + + * Property value: Integer + * Unit: MB + * Default value: 0 + * Example + +.. code-block:: sh + + \set MAX_OUTPUT_FILE_SIZE 0 + +.. describe:: SESSION_EXPIRY_TIME + +Session expiry time. + + * Property value: Integer + * Unit: seconds + * Default value: 3600 + * Example + +.. code-block:: sh + + \set SESSION_EXPIRY_TIME 3600 + +.. describe:: CLI_COLUMNS + +Sets the width for the wrapped format. + + * Property value: Integer + * Default value: 120 + * Example + +.. code-block:: sh + + \set CLI_COLUMNS 120 + +.. describe:: CLI_NULL_CHAR + +Sets the string to be printed in place of a null value. + + * Property value: String + * Default value: '' + * Example + +.. code-block:: sh + + \set CLI_NULL_CHAR '' + +.. describe:: CLI_PAGE_ROWS + +Sets the number of rows for paging. + + * Property value: Integer + * Default value: 100 + * Example + +.. code-block:: sh + + \set CLI_PAGE_ROWS 100 + +.. describe:: CLI_PAGING_ENABLED + +Enable paging of result display. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_PAGING_ENABLED true + +.. describe:: CLI_DISPLAY_ERROR_TRACE + +Enable display of error trace. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_DISPLAY_ERROR_TRACE true + +.. describe:: CLI_FORMATTER_CLASS + +Sets the output format class to display results. + + * Property value: Class name + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + * Example + +.. code-block:: sh + + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + +.. describe:: ON_ERROR_STOP + +tsql will exist if an error occurs. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ON_ERROR_STOP false + +.. describe:: NULL_CHAR + +Null char of text file output. + + * Property value: String + * Default value: ' N' + * Example + +.. code-block:: sh + + \set NULL_CHAR ' N' + +.. describe:: DEBUG_ENABLED + +Debug mode enabled. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set DEBUG_ENABLED false + +.. describe:: FETCH_ROWNUM + +Sets the number of rows at a time from Master. End diff – How about them? the number of rows in a ResultSet for each fetch the number of rows to be fetched from Master
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44101000

        — Diff: tajo-common/src/main/java/org/apache/tajo/SessionVars.java —
        @@ -131,7 +131,7 @@
        NULL_CHAR(ConfVars.$TEXT_NULL, "null char of text file output", DEFAULT),
        CODEGEN(ConfVars.$CODEGEN, "Runtime code generation enabled (experiment)", DEFAULT),
        AGG_HASH_TABLE_SIZE(ConfVars.$AGG_HASH_TABLE_SIZE, "Aggregation hash table size", DEFAULT),

        • SORT_HASH_TABLE_SIZE(ConfVars.$SORT_HASH_TABLE_SIZE, "Sort hash table size", DEFAULT),
          + SORT_LIST_SIZE(ConfVars.$SORT_LIST_SIZE, "Sort hash table size", DEFAULT),
            • End diff –

        Does the description need to be changed?

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44101000 — Diff: tajo-common/src/main/java/org/apache/tajo/SessionVars.java — @@ -131,7 +131,7 @@ NULL_CHAR(ConfVars.$TEXT_NULL, "null char of text file output", DEFAULT), CODEGEN(ConfVars.$CODEGEN, "Runtime code generation enabled (experiment)", DEFAULT), AGG_HASH_TABLE_SIZE(ConfVars.$AGG_HASH_TABLE_SIZE, "Aggregation hash table size", DEFAULT), SORT_HASH_TABLE_SIZE(ConfVars.$SORT_HASH_TABLE_SIZE, "Sort hash table size", DEFAULT), + SORT_LIST_SIZE(ConfVars.$SORT_LIST_SIZE, "Sort hash table size", DEFAULT), End diff – Does the description need to be changed?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44101567

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 10000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set AGG_HASH_TABLE_SIZE 10000
            +
            +.. describe:: TIMEZONE
            +
            +Refer to :doc:`/time_zone`.
            +
            + * Property value: Time zone id
            + * Default value: Default time zone of JVM
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TIMEZONE GMT+9
            +
            +.. describe:: DATE_ORDER
            +
            +Date order specification.
            +
            + * Property value: One of YMD, DMY, MDY.
            + * Default value: YMD
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DATE_ORDER YMD
            +
            +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
            +
            +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
            +
            +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
            +
            +In Tajo, storing a partition table is executed in two stages.
            +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: ARITHABORT
            +
            +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ARITHABORT false
            +
            +.. describe:: MAX_OUTPUT_FILE_SIZE
            +
            +Maximum per-output file size. 0 means infinite.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 0
            + * Example
            +
            +.. code-block:: sh
            +
            + \set MAX_OUTPUT_FILE_SIZE 0
            +
            +.. describe:: SESSION_EXPIRY_TIME
            +
            +Session expiry time.
            +
            + * Property value: Integer
            + * Unit: seconds
            + * Default value: 3600
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SESSION_EXPIRY_TIME 3600
            +
            +.. describe:: CLI_COLUMNS
            +
            +Sets the width for the wrapped format.
            +
            + * Property value: Integer
            + * Default value: 120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_COLUMNS 120
            +
            +.. describe:: CLI_NULL_CHAR
            +
            +Sets the string to be printed in place of a null value.
            +
            + * Property value: String
            + * Default value: ''
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_NULL_CHAR ''
            +
            +.. describe:: CLI_PAGE_ROWS
            +
            +Sets the number of rows for paging.
            +
            + * Property value: Integer
            + * Default value: 100
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGE_ROWS 100
            +
            +.. describe:: CLI_PAGING_ENABLED
            +
            +Enable paging of result display.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGING_ENABLED true
            +
            +.. describe:: CLI_DISPLAY_ERROR_TRACE
            +
            +Enable display of error trace.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_DISPLAY_ERROR_TRACE true
            +
            +.. describe:: CLI_FORMATTER_CLASS
            +
            +Sets the output format class to display results.
            +
            + * Property value: Class name
            + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            +
            +.. describe:: ON_ERROR_STOP
            +
            +tsql will exist if an error occurs.
            • End diff –

        exist -> exit

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44101567 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 10000 + * Example + +.. code-block:: sh + + \set AGG_HASH_TABLE_SIZE 10000 + +.. describe:: TIMEZONE + +Refer to :doc:`/time_zone`. + + * Property value: Time zone id + * Default value: Default time zone of JVM + * Example + +.. code-block:: sh + + \set TIMEZONE GMT+9 + +.. describe:: DATE_ORDER + +Date order specification. + + * Property value: One of YMD, DMY, MDY. + * Default value: YMD + * Example + +.. code-block:: sh + + \set DATE_ORDER YMD + +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED + +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false + +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE + +In Tajo, storing a partition table is executed in two stages. +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256 + +.. describe:: ARITHABORT + +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ARITHABORT false + +.. describe:: MAX_OUTPUT_FILE_SIZE + +Maximum per-output file size. 0 means infinite. + + * Property value: Integer + * Unit: MB + * Default value: 0 + * Example + +.. code-block:: sh + + \set MAX_OUTPUT_FILE_SIZE 0 + +.. describe:: SESSION_EXPIRY_TIME + +Session expiry time. + + * Property value: Integer + * Unit: seconds + * Default value: 3600 + * Example + +.. code-block:: sh + + \set SESSION_EXPIRY_TIME 3600 + +.. describe:: CLI_COLUMNS + +Sets the width for the wrapped format. + + * Property value: Integer + * Default value: 120 + * Example + +.. code-block:: sh + + \set CLI_COLUMNS 120 + +.. describe:: CLI_NULL_CHAR + +Sets the string to be printed in place of a null value. + + * Property value: String + * Default value: '' + * Example + +.. code-block:: sh + + \set CLI_NULL_CHAR '' + +.. describe:: CLI_PAGE_ROWS + +Sets the number of rows for paging. + + * Property value: Integer + * Default value: 100 + * Example + +.. code-block:: sh + + \set CLI_PAGE_ROWS 100 + +.. describe:: CLI_PAGING_ENABLED + +Enable paging of result display. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_PAGING_ENABLED true + +.. describe:: CLI_DISPLAY_ERROR_TRACE + +Enable display of error trace. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_DISPLAY_ERROR_TRACE true + +.. describe:: CLI_FORMATTER_CLASS + +Sets the output format class to display results. + + * Property value: Class name + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + * Example + +.. code-block:: sh + + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + +.. describe:: ON_ERROR_STOP + +tsql will exist if an error occurs. End diff – exist -> exit
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44101822

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 10000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set AGG_HASH_TABLE_SIZE 10000
            +
            +.. describe:: TIMEZONE
            +
            +Refer to :doc:`/time_zone`.
            +
            + * Property value: Time zone id
            + * Default value: Default time zone of JVM
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TIMEZONE GMT+9
            +
            +.. describe:: DATE_ORDER
            +
            +Date order specification.
            +
            + * Property value: One of YMD, DMY, MDY.
            + * Default value: YMD
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DATE_ORDER YMD
            +
            +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
            +
            +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
            +
            +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
            +
            +In Tajo, storing a partition table is executed in two stages.
            +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: ARITHABORT
            +
            +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ARITHABORT false
            +
            +.. describe:: MAX_OUTPUT_FILE_SIZE
            +
            +Maximum per-output file size. 0 means infinite.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 0
            + * Example
            +
            +.. code-block:: sh
            +
            + \set MAX_OUTPUT_FILE_SIZE 0
            +
            +.. describe:: SESSION_EXPIRY_TIME
            +
            +Session expiry time.
            +
            + * Property value: Integer
            + * Unit: seconds
            + * Default value: 3600
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SESSION_EXPIRY_TIME 3600
            +
            +.. describe:: CLI_COLUMNS
            +
            +Sets the width for the wrapped format.
            +
            + * Property value: Integer
            + * Default value: 120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_COLUMNS 120
            +
            +.. describe:: CLI_NULL_CHAR
            +
            +Sets the string to be printed in place of a null value.
            +
            + * Property value: String
            + * Default value: ''
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_NULL_CHAR ''
            +
            +.. describe:: CLI_PAGE_ROWS
            +
            +Sets the number of rows for paging.
            +
            + * Property value: Integer
            + * Default value: 100
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGE_ROWS 100
            +
            +.. describe:: CLI_PAGING_ENABLED
            +
            +Enable paging of result display.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGING_ENABLED true
            +
            +.. describe:: CLI_DISPLAY_ERROR_TRACE
            +
            +Enable display of error trace.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_DISPLAY_ERROR_TRACE true
            +
            +.. describe:: CLI_FORMATTER_CLASS
            +
            +Sets the output format class to display results.
            +
            + * Property value: Class name
            + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            +
            +.. describe:: ON_ERROR_STOP
            +
            +tsql will exist if an error occurs.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ON_ERROR_STOP false
            +
            +.. describe:: NULL_CHAR
            +
            +Null char of text file output.
            • End diff –

        It would be better to add 'used when the table property ``text.null`` is not specified' after the this description.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44101822 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 10000 + * Example + +.. code-block:: sh + + \set AGG_HASH_TABLE_SIZE 10000 + +.. describe:: TIMEZONE + +Refer to :doc:`/time_zone`. + + * Property value: Time zone id + * Default value: Default time zone of JVM + * Example + +.. code-block:: sh + + \set TIMEZONE GMT+9 + +.. describe:: DATE_ORDER + +Date order specification. + + * Property value: One of YMD, DMY, MDY. + * Default value: YMD + * Example + +.. code-block:: sh + + \set DATE_ORDER YMD + +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED + +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false + +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE + +In Tajo, storing a partition table is executed in two stages. +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256 + +.. describe:: ARITHABORT + +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ARITHABORT false + +.. describe:: MAX_OUTPUT_FILE_SIZE + +Maximum per-output file size. 0 means infinite. + + * Property value: Integer + * Unit: MB + * Default value: 0 + * Example + +.. code-block:: sh + + \set MAX_OUTPUT_FILE_SIZE 0 + +.. describe:: SESSION_EXPIRY_TIME + +Session expiry time. + + * Property value: Integer + * Unit: seconds + * Default value: 3600 + * Example + +.. code-block:: sh + + \set SESSION_EXPIRY_TIME 3600 + +.. describe:: CLI_COLUMNS + +Sets the width for the wrapped format. + + * Property value: Integer + * Default value: 120 + * Example + +.. code-block:: sh + + \set CLI_COLUMNS 120 + +.. describe:: CLI_NULL_CHAR + +Sets the string to be printed in place of a null value. + + * Property value: String + * Default value: '' + * Example + +.. code-block:: sh + + \set CLI_NULL_CHAR '' + +.. describe:: CLI_PAGE_ROWS + +Sets the number of rows for paging. + + * Property value: Integer + * Default value: 100 + * Example + +.. code-block:: sh + + \set CLI_PAGE_ROWS 100 + +.. describe:: CLI_PAGING_ENABLED + +Enable paging of result display. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_PAGING_ENABLED true + +.. describe:: CLI_DISPLAY_ERROR_TRACE + +Enable display of error trace. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_DISPLAY_ERROR_TRACE true + +.. describe:: CLI_FORMATTER_CLASS + +Sets the output format class to display results. + + * Property value: Class name + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + * Example + +.. code-block:: sh + + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + +.. describe:: ON_ERROR_STOP + +tsql will exist if an error occurs. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ON_ERROR_STOP false + +.. describe:: NULL_CHAR + +Null char of text file output. End diff – It would be better to add 'used when the table property ``text.null`` is not specified' after the this description.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44101925

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            • End diff –

        I think this name looks weird. It lets me think it is like 'number of join task for each shuffle (size)'.
        But I have no idea for good alternative.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44101925 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE End diff – I think this name looks weird. It lets me think it is like 'number of join task for each shuffle (size)'. But I have no idea for good alternative.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44101976

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            • End diff –

        Corresponding config property name uses 'threshold', but this one is 'limit'.
        I think it needs some kind of consistency.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44101976 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT End diff – Corresponding config property name uses 'threshold', but this one is 'limit'. I think it needs some kind of consistency.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44102039

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            • End diff –

        Description is for 'list', but name is 'hash table'.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44102039 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. End diff – Description is for 'list', but name is 'hash table'.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44102768

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            • End diff –

        Actually, join is processed in the second stage. The first stage is for shuffle. This parameters only determines the input size of the second stages. So, this name is proper in my opinion.

        In addition, this parameter has a similar context to map (reduce) task size. So, it would be familiar to those who have Hadoop experiences.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44102768 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE End diff – Actually, join is processed in the second stage. The first stage is for shuffle. This parameters only determines the input size of the second stages. So, this name is proper in my opinion. In addition, this parameter has a similar context to map (reduce) task size. So, it would be familiar to those who have Hadoop experiences.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44103209

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            • End diff –

        I agreed. ``Threshold`` may be too long for human to put on cli. Actually, there mainly are two kinds of suffixes: SIZE and LIMIT. SIZE is literally used to represent size matter. ``LIMIT`` has been used to represent some number to determine whether doing or not doing according to the number. It would be better to make them consistent.

        But, it needs to be addressed in an another jira.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44103209 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT End diff – I agreed. ``Threshold`` may be too long for human to put on cli. Actually, there mainly are two kinds of suffixes: SIZE and LIMIT. SIZE is literally used to represent size matter. ``LIMIT`` has been used to represent some number to determine whether doing or not doing according to the number. It would be better to make them consistent. But, it needs to be addressed in an another jira.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44104195

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 10000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set AGG_HASH_TABLE_SIZE 10000
            +
            +.. describe:: TIMEZONE
            +
            +Refer to :doc:`/time_zone`.
            +
            + * Property value: Time zone id
            + * Default value: Default time zone of JVM
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TIMEZONE GMT+9
            +
            +.. describe:: DATE_ORDER
            +
            +Date order specification.
            +
            + * Property value: One of YMD, DMY, MDY.
            + * Default value: YMD
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DATE_ORDER YMD
            +
            +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
            +
            +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
            +
            +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
            +
            +In Tajo, storing a partition table is executed in two stages.
            +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: ARITHABORT
            +
            +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ARITHABORT false
            +
            +.. describe:: MAX_OUTPUT_FILE_SIZE
            +
            +Maximum per-output file size. 0 means infinite.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 0
            + * Example
            +
            +.. code-block:: sh
            +
            + \set MAX_OUTPUT_FILE_SIZE 0
            +
            +.. describe:: SESSION_EXPIRY_TIME
            +
            +Session expiry time.
            +
            + * Property value: Integer
            + * Unit: seconds
            + * Default value: 3600
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SESSION_EXPIRY_TIME 3600
            +
            +.. describe:: CLI_COLUMNS
            +
            +Sets the width for the wrapped format.
            +
            + * Property value: Integer
            + * Default value: 120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_COLUMNS 120
            +
            +.. describe:: CLI_NULL_CHAR
            +
            +Sets the string to be printed in place of a null value.
            +
            + * Property value: String
            + * Default value: ''
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_NULL_CHAR ''
            +
            +.. describe:: CLI_PAGE_ROWS
            +
            +Sets the number of rows for paging.
            +
            + * Property value: Integer
            + * Default value: 100
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGE_ROWS 100
            +
            +.. describe:: CLI_PAGING_ENABLED
            +
            +Enable paging of result display.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGING_ENABLED true
            +
            +.. describe:: CLI_DISPLAY_ERROR_TRACE
            +
            +Enable display of error trace.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_DISPLAY_ERROR_TRACE true
            +
            +.. describe:: CLI_FORMATTER_CLASS
            +
            +Sets the output format class to display results.
            +
            + * Property value: Class name
            + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            +
            +.. describe:: ON_ERROR_STOP
            +
            +tsql will exist if an error occurs.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ON_ERROR_STOP false
            +
            +.. describe:: NULL_CHAR
            +
            +Null char of text file output.
            +
            + * Property value: String
            + * Default value: '
            N'
            + * Example
            +
            +.. code-block:: sh
            +
            + \set NULL_CHAR '
            N'
            +
            +.. describe:: DEBUG_ENABLED
            +
            +Debug mode enabled.
            • End diff –

        'To enable debug mode' might be more appropriate.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44104195 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 10000 + * Example + +.. code-block:: sh + + \set AGG_HASH_TABLE_SIZE 10000 + +.. describe:: TIMEZONE + +Refer to :doc:`/time_zone`. + + * Property value: Time zone id + * Default value: Default time zone of JVM + * Example + +.. code-block:: sh + + \set TIMEZONE GMT+9 + +.. describe:: DATE_ORDER + +Date order specification. + + * Property value: One of YMD, DMY, MDY. + * Default value: YMD + * Example + +.. code-block:: sh + + \set DATE_ORDER YMD + +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED + +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false + +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE + +In Tajo, storing a partition table is executed in two stages. +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256 + +.. describe:: ARITHABORT + +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ARITHABORT false + +.. describe:: MAX_OUTPUT_FILE_SIZE + +Maximum per-output file size. 0 means infinite. + + * Property value: Integer + * Unit: MB + * Default value: 0 + * Example + +.. code-block:: sh + + \set MAX_OUTPUT_FILE_SIZE 0 + +.. describe:: SESSION_EXPIRY_TIME + +Session expiry time. + + * Property value: Integer + * Unit: seconds + * Default value: 3600 + * Example + +.. code-block:: sh + + \set SESSION_EXPIRY_TIME 3600 + +.. describe:: CLI_COLUMNS + +Sets the width for the wrapped format. + + * Property value: Integer + * Default value: 120 + * Example + +.. code-block:: sh + + \set CLI_COLUMNS 120 + +.. describe:: CLI_NULL_CHAR + +Sets the string to be printed in place of a null value. + + * Property value: String + * Default value: '' + * Example + +.. code-block:: sh + + \set CLI_NULL_CHAR '' + +.. describe:: CLI_PAGE_ROWS + +Sets the number of rows for paging. + + * Property value: Integer + * Default value: 100 + * Example + +.. code-block:: sh + + \set CLI_PAGE_ROWS 100 + +.. describe:: CLI_PAGING_ENABLED + +Enable paging of result display. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_PAGING_ENABLED true + +.. describe:: CLI_DISPLAY_ERROR_TRACE + +Enable display of error trace. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_DISPLAY_ERROR_TRACE true + +.. describe:: CLI_FORMATTER_CLASS + +Sets the output format class to display results. + + * Property value: Class name + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + * Example + +.. code-block:: sh + + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + +.. describe:: ON_ERROR_STOP + +tsql will exist if an error occurs. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ON_ERROR_STOP false + +.. describe:: NULL_CHAR + +Null char of text file output. + + * Property value: String + * Default value: ' N' + * Example + +.. code-block:: sh + + \set NULL_CHAR ' N' + +.. describe:: DEBUG_ENABLED + +Debug mode enabled. End diff – 'To enable debug mode' might be more appropriate.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on the pull request:

        https://github.com/apache/tajo/pull/848#issuecomment-154285736

        I leave some comments.
        Actually many items are related and duplicated with 'tajo-site.xml' doc.

        I think you'd better consider to omit explanations for those items and to put corresponding references to tajo-site.xml.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/848#issuecomment-154285736 I leave some comments. Actually many items are related and duplicated with 'tajo-site.xml' doc. I think you'd better consider to omit explanations for those items and to put corresponding references to tajo-site.xml.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/848#issuecomment-154286149

        Nice idea. I'll add references to corresponding configurations in tajo-site.xml after #844 is committed.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/848#issuecomment-154286149 Nice idea. I'll add references to corresponding configurations in tajo-site.xml after #844 is committed.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44104526

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 10000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set AGG_HASH_TABLE_SIZE 10000
            +
            +.. describe:: TIMEZONE
            +
            +Refer to :doc:`/time_zone`.
            +
            + * Property value: Time zone id
            + * Default value: Default time zone of JVM
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TIMEZONE GMT+9
            +
            +.. describe:: DATE_ORDER
            +
            +Date order specification.
            +
            + * Property value: One of YMD, DMY, MDY.
            + * Default value: YMD
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DATE_ORDER YMD
            +
            +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
            +
            +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
            +
            +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
            +
            +In Tajo, storing a partition table is executed in two stages.
            +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: ARITHABORT
            +
            +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ARITHABORT false
            +
            +.. describe:: MAX_OUTPUT_FILE_SIZE
            +
            +Maximum per-output file size. 0 means infinite.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 0
            + * Example
            +
            +.. code-block:: sh
            +
            + \set MAX_OUTPUT_FILE_SIZE 0
            +
            +.. describe:: SESSION_EXPIRY_TIME
            +
            +Session expiry time.
            +
            + * Property value: Integer
            + * Unit: seconds
            + * Default value: 3600
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SESSION_EXPIRY_TIME 3600
            +
            +.. describe:: CLI_COLUMNS
            +
            +Sets the width for the wrapped format.
            +
            + * Property value: Integer
            + * Default value: 120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_COLUMNS 120
            +
            +.. describe:: CLI_NULL_CHAR
            +
            +Sets the string to be printed in place of a null value.
            +
            + * Property value: String
            + * Default value: ''
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_NULL_CHAR ''
            +
            +.. describe:: CLI_PAGE_ROWS
            +
            +Sets the number of rows for paging.
            +
            + * Property value: Integer
            + * Default value: 100
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGE_ROWS 100
            +
            +.. describe:: CLI_PAGING_ENABLED
            +
            +Enable paging of result display.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGING_ENABLED true
            +
            +.. describe:: CLI_DISPLAY_ERROR_TRACE
            +
            +Enable display of error trace.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_DISPLAY_ERROR_TRACE true
            +
            +.. describe:: CLI_FORMATTER_CLASS
            +
            +Sets the output format class to display results.
            +
            + * Property value: Class name
            + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            +
            +.. describe:: ON_ERROR_STOP
            +
            +tsql will exist if an error occurs.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ON_ERROR_STOP false
            +
            +.. describe:: NULL_CHAR
            +
            +Null char of text file output.
            +
            + * Property value: String
            + * Default value: '
            N'
            + * Example
            +
            +.. code-block:: sh
            +
            + \set NULL_CHAR '
            N'
            +
            +.. describe:: DEBUG_ENABLED
            +
            +Debug mode enabled.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DEBUG_ENABLED false
            +
            +.. describe:: FETCH_ROWNUM
            +
            +Sets the number of rows at a time from Master.
            • End diff –

        How about ```The number of rows to be fetched from Master at once```?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44104526 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 10000 + * Example + +.. code-block:: sh + + \set AGG_HASH_TABLE_SIZE 10000 + +.. describe:: TIMEZONE + +Refer to :doc:`/time_zone`. + + * Property value: Time zone id + * Default value: Default time zone of JVM + * Example + +.. code-block:: sh + + \set TIMEZONE GMT+9 + +.. describe:: DATE_ORDER + +Date order specification. + + * Property value: One of YMD, DMY, MDY. + * Default value: YMD + * Example + +.. code-block:: sh + + \set DATE_ORDER YMD + +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED + +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false + +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE + +In Tajo, storing a partition table is executed in two stages. +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256 + +.. describe:: ARITHABORT + +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ARITHABORT false + +.. describe:: MAX_OUTPUT_FILE_SIZE + +Maximum per-output file size. 0 means infinite. + + * Property value: Integer + * Unit: MB + * Default value: 0 + * Example + +.. code-block:: sh + + \set MAX_OUTPUT_FILE_SIZE 0 + +.. describe:: SESSION_EXPIRY_TIME + +Session expiry time. + + * Property value: Integer + * Unit: seconds + * Default value: 3600 + * Example + +.. code-block:: sh + + \set SESSION_EXPIRY_TIME 3600 + +.. describe:: CLI_COLUMNS + +Sets the width for the wrapped format. + + * Property value: Integer + * Default value: 120 + * Example + +.. code-block:: sh + + \set CLI_COLUMNS 120 + +.. describe:: CLI_NULL_CHAR + +Sets the string to be printed in place of a null value. + + * Property value: String + * Default value: '' + * Example + +.. code-block:: sh + + \set CLI_NULL_CHAR '' + +.. describe:: CLI_PAGE_ROWS + +Sets the number of rows for paging. + + * Property value: Integer + * Default value: 100 + * Example + +.. code-block:: sh + + \set CLI_PAGE_ROWS 100 + +.. describe:: CLI_PAGING_ENABLED + +Enable paging of result display. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_PAGING_ENABLED true + +.. describe:: CLI_DISPLAY_ERROR_TRACE + +Enable display of error trace. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_DISPLAY_ERROR_TRACE true + +.. describe:: CLI_FORMATTER_CLASS + +Sets the output format class to display results. + + * Property value: Class name + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + * Example + +.. code-block:: sh + + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + +.. describe:: ON_ERROR_STOP + +tsql will exist if an error occurs. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ON_ERROR_STOP false + +.. describe:: NULL_CHAR + +Null char of text file output. + + * Property value: String + * Default value: ' N' + * Example + +.. code-block:: sh + + \set NULL_CHAR ' N' + +.. describe:: DEBUG_ENABLED + +Debug mode enabled. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set DEBUG_ENABLED false + +.. describe:: FETCH_ROWNUM + +Sets the number of rows at a time from Master. End diff – How about ```The number of rows to be fetched from Master at once```?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44104531

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 10000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set AGG_HASH_TABLE_SIZE 10000
            +
            +.. describe:: TIMEZONE
            +
            +Refer to :doc:`/time_zone`.
            +
            + * Property value: Time zone id
            + * Default value: Default time zone of JVM
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TIMEZONE GMT+9
            +
            +.. describe:: DATE_ORDER
            +
            +Date order specification.
            +
            + * Property value: One of YMD, DMY, MDY.
            + * Default value: YMD
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DATE_ORDER YMD
            +
            +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
            +
            +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
            +
            +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
            +
            +In Tajo, storing a partition table is executed in two stages.
            +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: ARITHABORT
            +
            +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ARITHABORT false
            +
            +.. describe:: MAX_OUTPUT_FILE_SIZE
            +
            +Maximum per-output file size. 0 means infinite.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 0
            + * Example
            +
            +.. code-block:: sh
            +
            + \set MAX_OUTPUT_FILE_SIZE 0
            +
            +.. describe:: SESSION_EXPIRY_TIME
            +
            +Session expiry time.
            +
            + * Property value: Integer
            + * Unit: seconds
            + * Default value: 3600
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SESSION_EXPIRY_TIME 3600
            +
            +.. describe:: CLI_COLUMNS
            +
            +Sets the width for the wrapped format.
            +
            + * Property value: Integer
            + * Default value: 120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_COLUMNS 120
            +
            +.. describe:: CLI_NULL_CHAR
            +
            +Sets the string to be printed in place of a null value.
            +
            + * Property value: String
            + * Default value: ''
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_NULL_CHAR ''
            +
            +.. describe:: CLI_PAGE_ROWS
            +
            +Sets the number of rows for paging.
            +
            + * Property value: Integer
            + * Default value: 100
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGE_ROWS 100
            +
            +.. describe:: CLI_PAGING_ENABLED
            +
            +Enable paging of result display.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGING_ENABLED true
            +
            +.. describe:: CLI_DISPLAY_ERROR_TRACE
            +
            +Enable display of error trace.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_DISPLAY_ERROR_TRACE true
            +
            +.. describe:: CLI_FORMATTER_CLASS
            +
            +Sets the output format class to display results.
            +
            + * Property value: Class name
            + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            +
            +.. describe:: ON_ERROR_STOP
            +
            +tsql will exist if an error occurs.
            • End diff –

        My mistake. Thanks.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44104531 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 10000 + * Example + +.. code-block:: sh + + \set AGG_HASH_TABLE_SIZE 10000 + +.. describe:: TIMEZONE + +Refer to :doc:`/time_zone`. + + * Property value: Time zone id + * Default value: Default time zone of JVM + * Example + +.. code-block:: sh + + \set TIMEZONE GMT+9 + +.. describe:: DATE_ORDER + +Date order specification. + + * Property value: One of YMD, DMY, MDY. + * Default value: YMD + * Example + +.. code-block:: sh + + \set DATE_ORDER YMD + +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED + +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false + +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE + +In Tajo, storing a partition table is executed in two stages. +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256 + +.. describe:: ARITHABORT + +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ARITHABORT false + +.. describe:: MAX_OUTPUT_FILE_SIZE + +Maximum per-output file size. 0 means infinite. + + * Property value: Integer + * Unit: MB + * Default value: 0 + * Example + +.. code-block:: sh + + \set MAX_OUTPUT_FILE_SIZE 0 + +.. describe:: SESSION_EXPIRY_TIME + +Session expiry time. + + * Property value: Integer + * Unit: seconds + * Default value: 3600 + * Example + +.. code-block:: sh + + \set SESSION_EXPIRY_TIME 3600 + +.. describe:: CLI_COLUMNS + +Sets the width for the wrapped format. + + * Property value: Integer + * Default value: 120 + * Example + +.. code-block:: sh + + \set CLI_COLUMNS 120 + +.. describe:: CLI_NULL_CHAR + +Sets the string to be printed in place of a null value. + + * Property value: String + * Default value: '' + * Example + +.. code-block:: sh + + \set CLI_NULL_CHAR '' + +.. describe:: CLI_PAGE_ROWS + +Sets the number of rows for paging. + + * Property value: Integer + * Default value: 100 + * Example + +.. code-block:: sh + + \set CLI_PAGE_ROWS 100 + +.. describe:: CLI_PAGING_ENABLED + +Enable paging of result display. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_PAGING_ENABLED true + +.. describe:: CLI_DISPLAY_ERROR_TRACE + +Enable display of error trace. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_DISPLAY_ERROR_TRACE true + +.. describe:: CLI_FORMATTER_CLASS + +Sets the output format class to display results. + + * Property value: Class name + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + * Example + +.. code-block:: sh + + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + +.. describe:: ON_ERROR_STOP + +tsql will exist if an error occurs. End diff – My mistake. Thanks.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44104536

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 10000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set AGG_HASH_TABLE_SIZE 10000
            +
            +.. describe:: TIMEZONE
            +
            +Refer to :doc:`/time_zone`.
            +
            + * Property value: Time zone id
            + * Default value: Default time zone of JVM
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TIMEZONE GMT+9
            +
            +.. describe:: DATE_ORDER
            +
            +Date order specification.
            +
            + * Property value: One of YMD, DMY, MDY.
            + * Default value: YMD
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DATE_ORDER YMD
            +
            +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
            +
            +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
            +
            +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
            +
            +In Tajo, storing a partition table is executed in two stages.
            +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: ARITHABORT
            +
            +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ARITHABORT false
            +
            +.. describe:: MAX_OUTPUT_FILE_SIZE
            +
            +Maximum per-output file size. 0 means infinite.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 0
            + * Example
            +
            +.. code-block:: sh
            +
            + \set MAX_OUTPUT_FILE_SIZE 0
            +
            +.. describe:: SESSION_EXPIRY_TIME
            +
            +Session expiry time.
            +
            + * Property value: Integer
            + * Unit: seconds
            + * Default value: 3600
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SESSION_EXPIRY_TIME 3600
            +
            +.. describe:: CLI_COLUMNS
            +
            +Sets the width for the wrapped format.
            +
            + * Property value: Integer
            + * Default value: 120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_COLUMNS 120
            +
            +.. describe:: CLI_NULL_CHAR
            +
            +Sets the string to be printed in place of a null value.
            +
            + * Property value: String
            + * Default value: ''
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_NULL_CHAR ''
            +
            +.. describe:: CLI_PAGE_ROWS
            +
            +Sets the number of rows for paging.
            +
            + * Property value: Integer
            + * Default value: 100
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGE_ROWS 100
            +
            +.. describe:: CLI_PAGING_ENABLED
            +
            +Enable paging of result display.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGING_ENABLED true
            +
            +.. describe:: CLI_DISPLAY_ERROR_TRACE
            +
            +Enable display of error trace.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_DISPLAY_ERROR_TRACE true
            +
            +.. describe:: CLI_FORMATTER_CLASS
            +
            +Sets the output format class to display results.
            +
            + * Property value: Class name
            + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            +
            +.. describe:: ON_ERROR_STOP
            +
            +tsql will exist if an error occurs.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ON_ERROR_STOP false
            +
            +.. describe:: NULL_CHAR
            +
            +Null char of text file output.
            • End diff –

        Thanks for the good comment.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44104536 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 10000 + * Example + +.. code-block:: sh + + \set AGG_HASH_TABLE_SIZE 10000 + +.. describe:: TIMEZONE + +Refer to :doc:`/time_zone`. + + * Property value: Time zone id + * Default value: Default time zone of JVM + * Example + +.. code-block:: sh + + \set TIMEZONE GMT+9 + +.. describe:: DATE_ORDER + +Date order specification. + + * Property value: One of YMD, DMY, MDY. + * Default value: YMD + * Example + +.. code-block:: sh + + \set DATE_ORDER YMD + +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED + +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false + +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE + +In Tajo, storing a partition table is executed in two stages. +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256 + +.. describe:: ARITHABORT + +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ARITHABORT false + +.. describe:: MAX_OUTPUT_FILE_SIZE + +Maximum per-output file size. 0 means infinite. + + * Property value: Integer + * Unit: MB + * Default value: 0 + * Example + +.. code-block:: sh + + \set MAX_OUTPUT_FILE_SIZE 0 + +.. describe:: SESSION_EXPIRY_TIME + +Session expiry time. + + * Property value: Integer + * Unit: seconds + * Default value: 3600 + * Example + +.. code-block:: sh + + \set SESSION_EXPIRY_TIME 3600 + +.. describe:: CLI_COLUMNS + +Sets the width for the wrapped format. + + * Property value: Integer + * Default value: 120 + * Example + +.. code-block:: sh + + \set CLI_COLUMNS 120 + +.. describe:: CLI_NULL_CHAR + +Sets the string to be printed in place of a null value. + + * Property value: String + * Default value: '' + * Example + +.. code-block:: sh + + \set CLI_NULL_CHAR '' + +.. describe:: CLI_PAGE_ROWS + +Sets the number of rows for paging. + + * Property value: Integer + * Default value: 100 + * Example + +.. code-block:: sh + + \set CLI_PAGE_ROWS 100 + +.. describe:: CLI_PAGING_ENABLED + +Enable paging of result display. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_PAGING_ENABLED true + +.. describe:: CLI_DISPLAY_ERROR_TRACE + +Enable display of error trace. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_DISPLAY_ERROR_TRACE true + +.. describe:: CLI_FORMATTER_CLASS + +Sets the output format class to display results. + + * Property value: Class name + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + * Example + +.. code-block:: sh + + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + +.. describe:: ON_ERROR_STOP + +tsql will exist if an error occurs. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ON_ERROR_STOP false + +.. describe:: NULL_CHAR + +Null char of text file output. End diff – Thanks for the good comment.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44104742

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            • End diff –

        I opened a new jira for this issue. https://issues.apache.org/jira/browse/TAJO-1968

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44104742 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT End diff – I opened a new jira for this issue. https://issues.apache.org/jira/browse/TAJO-1968
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44104757

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            • End diff –

        My mistake. Thanks.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44104757 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. End diff – My mistake. Thanks.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44104763

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 128
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_PER_SHUFFLE_SIZE 128
            +
            +.. describe:: HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform a join in a task.
            +If the input data is smaller than this value, join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an inner join in a task.
            +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set INNER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an outer join in a task.
            +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
            +Otherwise, the sort-merge join is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set OUTER_HASH_JOIN_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: JOIN_HASH_TABLE_SIZE
            +
            +The initial size of hash table for in-memory hash join.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_HASH_TABLE_SIZE 100000
            +
            +.. describe:: SORT_TASK_INPUT_SIZE
            +
            +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the sort query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_TASK_INPUT_SIZE 64
            +
            +.. describe:: EXTSORT_BUFFER_SIZE
            +
            +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 200
            + * Example
            +
            +.. code-block:: sh
            +
            + \set EXTSORT_BUFFER_SIZE 200
            +
            +.. describe:: SORT_LIST_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 100000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SORT_LIST_SIZE 100000
            +
            +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
            +
            +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
            +Otherwise, 2-phase aggregation algorithm is used.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_MULTI_LEVEL_ENABLED true
            +
            +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
            +
            +The aggregation is executed in two stages. When an aggregation query is executed,
            +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: GROUPBY_TASK_INPUT_SIZE
            +
            +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the aggregation query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set GROUPBY_TASK_INPUT_SIZE 64
            +
            +.. describe:: HASH_GROUPBY_SIZE_LIMIT
            +
            +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
            +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
            +Otherwise, the sort-based aggregation is used.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set HASH_GROUPBY_SIZE_LIMIT 64
            +
            +.. warning::
            + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
            + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
            + This value should be tuned carefully.
            +
            +.. describe:: AGG_HASH_TABLE_SIZE
            +
            +The initial size of list for in-memory sort.
            +
            + * Property value: Integer
            + * Default value: 10000
            + * Example
            +
            +.. code-block:: sh
            +
            + \set AGG_HASH_TABLE_SIZE 10000
            +
            +.. describe:: TIMEZONE
            +
            +Refer to :doc:`/time_zone`.
            +
            + * Property value: Time zone id
            + * Default value: Default time zone of JVM
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TIMEZONE GMT+9
            +
            +.. describe:: DATE_ORDER
            +
            +Date order specification.
            +
            + * Property value: One of YMD, DMY, MDY.
            + * Default value: YMD
            + * Example
            +
            +.. code-block:: sh
            +
            + \set DATE_ORDER YMD
            +
            +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
            +
            +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
            +
            +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
            +
            +In Tajo, storing a partition table is executed in two stages.
            +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 256
            + * Example
            +
            +.. code-block:: sh
            +
            + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
            +
            +.. describe:: ARITHABORT
            +
            +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ARITHABORT false
            +
            +.. describe:: MAX_OUTPUT_FILE_SIZE
            +
            +Maximum per-output file size. 0 means infinite.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 0
            + * Example
            +
            +.. code-block:: sh
            +
            + \set MAX_OUTPUT_FILE_SIZE 0
            +
            +.. describe:: SESSION_EXPIRY_TIME
            +
            +Session expiry time.
            +
            + * Property value: Integer
            + * Unit: seconds
            + * Default value: 3600
            + * Example
            +
            +.. code-block:: sh
            +
            + \set SESSION_EXPIRY_TIME 3600
            +
            +.. describe:: CLI_COLUMNS
            +
            +Sets the width for the wrapped format.
            +
            + * Property value: Integer
            + * Default value: 120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_COLUMNS 120
            +
            +.. describe:: CLI_NULL_CHAR
            +
            +Sets the string to be printed in place of a null value.
            +
            + * Property value: String
            + * Default value: ''
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_NULL_CHAR ''
            +
            +.. describe:: CLI_PAGE_ROWS
            +
            +Sets the number of rows for paging.
            +
            + * Property value: Integer
            + * Default value: 100
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGE_ROWS 100
            +
            +.. describe:: CLI_PAGING_ENABLED
            +
            +Enable paging of result display.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_PAGING_ENABLED true
            +
            +.. describe:: CLI_DISPLAY_ERROR_TRACE
            +
            +Enable display of error trace.
            +
            + * Property value: Boolean
            + * Default value: true
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_DISPLAY_ERROR_TRACE true
            +
            +.. describe:: CLI_FORMATTER_CLASS
            +
            +Sets the output format class to display results.
            +
            + * Property value: Class name
            + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            + * Example
            +
            +.. code-block:: sh
            +
            + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
            +
            +.. describe:: ON_ERROR_STOP
            +
            +tsql will exist if an error occurs.
            +
            + * Property value: Boolean
            + * Default value: false
            + * Example
            +
            +.. code-block:: sh
            +
            + \set ON_ERROR_STOP false
            +
            +.. describe:: NULL_CHAR
            +
            +Null char of text file output.
            +
            + * Property value: String
            + * Default value: '
            N'
            + * Example
            +
            +.. code-block:: sh
            +
            + \set NULL_CHAR '
            N'
            +
            +.. describe:: DEBUG_ENABLED
            +
            +Debug mode enabled.
            • End diff –

        Thanks for comment. I fixed.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44104763 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 128 + * Example + +.. code-block:: sh + + \set JOIN_PER_SHUFFLE_SIZE 128 + +.. describe:: HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform a join in a task. +If the input data is smaller than this value, join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an inner join in a task. +If the input data is smaller than this value, the inner join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set INNER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an outer join in a task. +If the input data is smaller than this value, the outer join is performed with the in-memory hash join. +Otherwise, the sort-merge join is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set OUTER_HASH_JOIN_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: JOIN_HASH_TABLE_SIZE + +The initial size of hash table for in-memory hash join. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set JOIN_HASH_TABLE_SIZE 100000 + +.. describe:: SORT_TASK_INPUT_SIZE + +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the sort query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set SORT_TASK_INPUT_SIZE 64 + +.. describe:: EXTSORT_BUFFER_SIZE + +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. + + * Property value: Integer + * Unit: MB + * Default value: 200 + * Example + +.. code-block:: sh + + \set EXTSORT_BUFFER_SIZE 200 + +.. describe:: SORT_LIST_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 100000 + * Example + +.. code-block:: sh + + \set SORT_LIST_SIZE 100000 + +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED + +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. +Otherwise, 2-phase aggregation algorithm is used. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set GROUPBY_MULTI_LEVEL_ENABLED true + +.. describe:: GROUPBY_PER_SHUFFLE_SIZE + +The aggregation is executed in two stages. When an aggregation query is executed, +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set GROUPBY_PER_SHUFFLE_SIZE 256 + +.. describe:: GROUPBY_TASK_INPUT_SIZE + +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the aggregation query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set GROUPBY_TASK_INPUT_SIZE 64 + +.. describe:: HASH_GROUPBY_SIZE_LIMIT + +This value provides the criterion to decide the algorithm to perform an aggregation in a task. +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. +Otherwise, the sort-based aggregation is used. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set HASH_GROUPBY_SIZE_LIMIT 64 + +.. warning:: + This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, + its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. + This value should be tuned carefully. + +.. describe:: AGG_HASH_TABLE_SIZE + +The initial size of list for in-memory sort. + + * Property value: Integer + * Default value: 10000 + * Example + +.. code-block:: sh + + \set AGG_HASH_TABLE_SIZE 10000 + +.. describe:: TIMEZONE + +Refer to :doc:`/time_zone`. + + * Property value: Time zone id + * Default value: Default time zone of JVM + * Example + +.. code-block:: sh + + \set TIMEZONE GMT+9 + +.. describe:: DATE_ORDER + +Date order specification. + + * Property value: One of YMD, DMY, MDY. + * Default value: YMD + * Example + +.. code-block:: sh + + \set DATE_ORDER YMD + +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED + +If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false + +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE + +In Tajo, storing a partition table is executed in two stages. +This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. + + * Property value: Integer + * Unit: MB + * Default value: 256 + * Example + +.. code-block:: sh + + \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256 + +.. describe:: ARITHABORT + +A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ARITHABORT false + +.. describe:: MAX_OUTPUT_FILE_SIZE + +Maximum per-output file size. 0 means infinite. + + * Property value: Integer + * Unit: MB + * Default value: 0 + * Example + +.. code-block:: sh + + \set MAX_OUTPUT_FILE_SIZE 0 + +.. describe:: SESSION_EXPIRY_TIME + +Session expiry time. + + * Property value: Integer + * Unit: seconds + * Default value: 3600 + * Example + +.. code-block:: sh + + \set SESSION_EXPIRY_TIME 3600 + +.. describe:: CLI_COLUMNS + +Sets the width for the wrapped format. + + * Property value: Integer + * Default value: 120 + * Example + +.. code-block:: sh + + \set CLI_COLUMNS 120 + +.. describe:: CLI_NULL_CHAR + +Sets the string to be printed in place of a null value. + + * Property value: String + * Default value: '' + * Example + +.. code-block:: sh + + \set CLI_NULL_CHAR '' + +.. describe:: CLI_PAGE_ROWS + +Sets the number of rows for paging. + + * Property value: Integer + * Default value: 100 + * Example + +.. code-block:: sh + + \set CLI_PAGE_ROWS 100 + +.. describe:: CLI_PAGING_ENABLED + +Enable paging of result display. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_PAGING_ENABLED true + +.. describe:: CLI_DISPLAY_ERROR_TRACE + +Enable display of error trace. + + * Property value: Boolean + * Default value: true + * Example + +.. code-block:: sh + + \set CLI_DISPLAY_ERROR_TRACE true + +.. describe:: CLI_FORMATTER_CLASS + +Sets the output format class to display results. + + * Property value: Class name + * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + * Example + +.. code-block:: sh + + \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter + +.. describe:: ON_ERROR_STOP + +tsql will exist if an error occurs. + + * Property value: Boolean + * Default value: false + * Example + +.. code-block:: sh + + \set ON_ERROR_STOP false + +.. describe:: NULL_CHAR + +Null char of text file output. + + * Property value: String + * Default value: ' N' + * Example + +.. code-block:: sh + + \set NULL_CHAR ' N' + +.. describe:: DEBUG_ENABLED + +Debug mode enabled. End diff – Thanks for comment. I fixed.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44104854

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            • End diff –

        Yes, this name is valid. Maybe, there are more intuitive names, but I also don't have any idea now. We can change when a good idea comes to mind.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44104854 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE End diff – Yes, this name is valid. Maybe, there are more intuitive names, but I also don't have any idea now. We can change when a good idea comes to mind.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/848#discussion_r44105294

        — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst —
        @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an
        Also, ``\unset key`` will unset the session variable named key.

        -Now, tajo provides the following session variables.
        -

          • ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
          • ``DIST_QUERY_JOIN_TASK_VOLUME``
          • ``DIST_QUERY_SORT_TASK_VOLUME``
          • ``DIST_QUERY_GROUPBY_TASK_VOLUME``
          • ``DIST_QUERY_JOIN_PARTITION_VOLUME``
          • ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
          • ``DIST_QUERY_TABLE_PARTITION_VOLUME``
          • ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
          • ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
          • ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
          • ``MAX_OUTPUT_FILE_SIZE``
          • ``CODEGEN``
          • ``CLIENT_SESSION_EXPIRY_TIME``
          • ``CLI_MAX_COLUMN``
          • ``CLI_NULL_CHAR``
          • ``CLI_PRINT_PAUSE_NUM_RECORDS``
          • ``CLI_PRINT_PAUSE``
          • ``CLI_PRINT_ERROR_TRACE``
          • ``CLI_OUTPUT_FORMATTER_CLASS``
          • ``CLI_ERROR_STOP``
          • ``TIMEZONE``
          • ``DATE_ORDER``
          • ``TEXT_NULL``
          • ``DEBUG_ENABLED``
          • ``BEHAVIOR_ARITHMETIC_ABORT``
          • ``RESULT_SET_FETCH_ROWNUM``
            +Currently, tajo provides the following session variables.
            +
            +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
            +
            +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 5120
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
            +
            +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
            +
            +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
            +
            + * Property value: Integer
            + * Unit: KB
            + * Default value: 1024
            + * Example
            +
            +.. code-block:: sh
            +
            + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
            +
            +.. warning::
            + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
            +
            +.. describe:: JOIN_TASK_INPUT_SIZE
            +
            +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
            +As a result, it determines the degree of the parallel processing of the join query.
            +
            + * Property value: Integer
            + * Unit: MB
            + * Default value: 64
            + * Example
            +
            +.. code-block:: sh
            +
            + \set JOIN_TASK_INPUT_SIZE 64
            +
            +.. describe:: JOIN_PER_SHUFFLE_SIZE
            • End diff –

        My target was 'JOIN_PER_SHUFFLE_SIZE', not 'JOIN_TASK_INPUT_SIZE'.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/848#discussion_r44105294 — Diff: tajo-docs/src/main/sphinx/tsql/variables.rst — @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique session, and the client an Also, ``\unset key`` will unset the session variable named key . -Now, tajo provides the following session variables. - ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD`` ``DIST_QUERY_JOIN_TASK_VOLUME`` ``DIST_QUERY_SORT_TASK_VOLUME`` ``DIST_QUERY_GROUPBY_TASK_VOLUME`` ``DIST_QUERY_JOIN_PARTITION_VOLUME`` ``DIST_QUERY_GROUPBY_PARTITION_VOLUME`` ``DIST_QUERY_TABLE_PARTITION_VOLUME`` ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE`` ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD`` ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD`` ``MAX_OUTPUT_FILE_SIZE`` ``CODEGEN`` ``CLIENT_SESSION_EXPIRY_TIME`` ``CLI_MAX_COLUMN`` ``CLI_NULL_CHAR`` ``CLI_PRINT_PAUSE_NUM_RECORDS`` ``CLI_PRINT_PAUSE`` ``CLI_PRINT_ERROR_TRACE`` ``CLI_OUTPUT_FORMATTER_CLASS`` ``CLI_ERROR_STOP`` ``TIMEZONE`` ``DATE_ORDER`` ``TEXT_NULL`` ``DEBUG_ENABLED`` ``BEHAVIOR_ARITHMETIC_ABORT`` ``RESULT_SET_FETCH_ROWNUM`` +Currently, tajo provides the following session variables. + +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD + +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 5120 + * Example + +.. code-block:: sh + + \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 + +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD + +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. + + * Property value: Integer + * Unit: KB + * Default value: 1024 + * Example + +.. code-block:: sh + + \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 + +.. warning:: + In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. + +.. describe:: JOIN_TASK_INPUT_SIZE + +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. +As a result, it determines the degree of the parallel processing of the join query. + + * Property value: Integer + * Unit: MB + * Default value: 64 + * Example + +.. code-block:: sh + + \set JOIN_TASK_INPUT_SIZE 64 + +.. describe:: JOIN_PER_SHUFFLE_SIZE End diff – My target was 'JOIN_PER_SHUFFLE_SIZE', not 'JOIN_TASK_INPUT_SIZE'.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/848#issuecomment-159776210

        @jihoonson Is it ready to be committed? If so, I'll finish the review soon.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/848#issuecomment-159776210 @jihoonson Is it ready to be committed? If so, I'll finish the review soon.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/848#issuecomment-159776641

        Thanks @hyunsik, but not yet. I'll ping after update soon.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/848#issuecomment-159776641 Thanks @hyunsik, but not yet. I'll ping after update soon.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/848#issuecomment-159781516

        @eminency, @hyunsik thanks for your review. I've updated my patch.
        Please review it.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/848#issuecomment-159781516 @eminency, @hyunsik thanks for your review. I've updated my patch. Please review it.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on the pull request:

        https://github.com/apache/tajo/pull/848#issuecomment-159808526

        I think it's very good.

        +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/848#issuecomment-159808526 I think it's very good. +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/848#issuecomment-159810455

        +1 ship it!

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/848#issuecomment-159810455 +1 ship it!
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/848

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/848
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #605 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/605/)
        TAJO-1962: Add description for session variables. (jihoonson: rev e52c33089608f6621c590066d429bf11de3a4843)

        • tajo-common/src/main/java/org/apache/tajo/SessionVars.java
        • CHANGES
        • tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst
        • tajo-docs/src/main/sphinx/tsql/variables.rst
        • tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #605 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/605/ ) TAJO-1962 : Add description for session variables. (jihoonson: rev e52c33089608f6621c590066d429bf11de3a4843) tajo-common/src/main/java/org/apache/tajo/SessionVars.java CHANGES tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst tajo-docs/src/main/sphinx/tsql/variables.rst tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result
        Hide
        jihoonson Jihoon Son added a comment -

        Committed to master and 0.11.1

        Show
        jihoonson Jihoon Son added a comment - Committed to master and 0.11.1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/848#issuecomment-159957560

        @eminency, @hyunsik thank you for your review. I've committed my patch.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/848#issuecomment-159957560 @eminency, @hyunsik thank you for your review. I've committed my patch.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #993 (See https://builds.apache.org/job/Tajo-master-build/993/)
        TAJO-1962: Add description for session variables. (jihoonson: rev e52c33089608f6621c590066d429bf11de3a4843)

        • tajo-common/src/main/java/org/apache/tajo/SessionVars.java
        • tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result
        • CHANGES
        • tajo-docs/src/main/sphinx/tsql/variables.rst
        • tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #993 (See https://builds.apache.org/job/Tajo-master-build/993/ ) TAJO-1962 : Add description for session variables. (jihoonson: rev e52c33089608f6621c590066d429bf11de3a4843) tajo-common/src/main/java/org/apache/tajo/SessionVars.java tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result CHANGES tajo-docs/src/main/sphinx/tsql/variables.rst tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst
        Hide
        hudson Hudson added a comment -

        ABORTED: Integrated in Tajo-0.11.1-build #115 (See https://builds.apache.org/job/Tajo-0.11.1-build/115/)
        TAJO-1962: Add description for session variables. (jihoonson: rev 903fc69de16ee7ff14a73845806f3773809eff83)

        • CHANGES
        • tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result
        • tajo-common/src/main/java/org/apache/tajo/SessionVars.java
        • tajo-docs/src/main/sphinx/tsql/variables.rst
        • tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst
        Show
        hudson Hudson added a comment - ABORTED: Integrated in Tajo-0.11.1-build #115 (See https://builds.apache.org/job/Tajo-0.11.1-build/115/ ) TAJO-1962 : Add description for session variables. (jihoonson: rev 903fc69de16ee7ff14a73845806f3773809eff83) CHANGES tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result tajo-common/src/main/java/org/apache/tajo/SessionVars.java tajo-docs/src/main/sphinx/tsql/variables.rst tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst

          People

          • Assignee:
            jihoonson Jihoon Son
            Reporter:
            jihoonson Jihoon Son
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development