Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1340

Change the default output file format.

    Details

      Description

      Currently, the default output file is CSV. Due to its nature, CSV has mainly three problems:

      • Its line or field delimiter can be duplicated to some character included in the result data.
      • Plan text file is likely to be larger than other file formats.
      • Its read and write performance is slow.

      We need to change the default output file format into other file formats. We also need to investigate which file format is the best for it.

      1. TAJO-1340_2.patch
        337 kB
        Jinho Kim
      2. TAJO-1340.patch
        341 kB
        Jinho Kim

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-0.11.0-build #41 (See https://builds.apache.org/job/Tajo-0.11.0-build/41/)
          TAJO-1340: Change the default output file format. (jhkim: rev 20c7e184c67b3ce517374cb45fb74f3dc454edc3)

          • tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrueDeprecated.result
          • tajo-client/src/main/java/org/apache/tajo/jdbc/TajoResultSetBase.java
          • tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestTajoJdbc.java
          • tajo-core-tests/src/test/java/org/apache/tajo/client/TestQueryClientExceptions.java
          • tajo-rpc/tajo-rpc-protobuf/src/main/java/org/apache/tajo/rpc/RpcClientManager.java
          • tajo-cluster-tests/src/test/java/org/apache/tajo/TajoTestingCluster.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java
          • CHANGES
          • tajo-plan/src/main/java/org/apache/tajo/plan/logical/PersistentStoreNode.java
          • tajo-client/src/main/java/org/apache/tajo/client/TajoClientUtil.java
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          • tajo-project/pom.xml
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
          • tajo-client/src/main/java/org/apache/tajo/jdbc/FetchResultSet.java
          • tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevPop1.result
          • tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenErrorDeprecated.result
          • tajo-common/src/main/java/org/apache/tajo/SessionVars.java
          • tajo-jdbc/src/main/java/org/apache/tajo/jdbc/JdbcConnection.java
          • tajo-common/pom.xml
          • tajo-client/src/main/proto/ClientProtos.proto
          • tajo-client/src/main/java/org/apache/tajo/client/SessionConnection.java
          • tajo-core-tests/src/test/java/org/apache/tajo/querymaster/TestTaskStatusUpdate.java
          • tajo-common/src/main/proto/tajo_protos.proto
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result
          • tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultSystemScanner.java
          • tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultFileScanner.java
          • tajo-common/src/main/java/org/apache/tajo/tuple/BaseTupleBuilder.java
          • tajo-common/src/main/proto/DataTypes.proto
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalPlanUtil.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenError.result
          • tajo-common/src/main/java/org/apache/tajo/util/CompressionUtil.java
          • tajo-core/src/main/java/org/apache/tajo/master/exec/QueryExecutor.java
          • tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevSamp1.result
          • tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestResultSet.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrue.result
          • tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowBlockWriter.java
          • tajo-client/src/main/java/org/apache/tajo/jdbc/TajoMemoryResultSet.java
          • tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java
          • tajo-client/src/main/java/org/apache/tajo/client/QueryClientImpl.java
          • tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
          • tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowWriter.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullFalse.result
          • tajo-core/src/main/java/org/apache/tajo/master/TajoMasterClientService.java
          • tajo-client/src/main/java/org/apache/tajo/client/TajoClientImpl.java
          • tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultScanner.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-0.11.0-build #41 (See https://builds.apache.org/job/Tajo-0.11.0-build/41/ ) TAJO-1340 : Change the default output file format. (jhkim: rev 20c7e184c67b3ce517374cb45fb74f3dc454edc3) tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrueDeprecated.result tajo-client/src/main/java/org/apache/tajo/jdbc/TajoResultSetBase.java tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestTajoJdbc.java tajo-core-tests/src/test/java/org/apache/tajo/client/TestQueryClientExceptions.java tajo-rpc/tajo-rpc-protobuf/src/main/java/org/apache/tajo/rpc/RpcClientManager.java tajo-cluster-tests/src/test/java/org/apache/tajo/TajoTestingCluster.java tajo-core/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java CHANGES tajo-plan/src/main/java/org/apache/tajo/plan/logical/PersistentStoreNode.java tajo-client/src/main/java/org/apache/tajo/client/TajoClientUtil.java tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-project/pom.xml tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java tajo-client/src/main/java/org/apache/tajo/jdbc/FetchResultSet.java tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevPop1.result tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenErrorDeprecated.result tajo-common/src/main/java/org/apache/tajo/SessionVars.java tajo-jdbc/src/main/java/org/apache/tajo/jdbc/JdbcConnection.java tajo-common/pom.xml tajo-client/src/main/proto/ClientProtos.proto tajo-client/src/main/java/org/apache/tajo/client/SessionConnection.java tajo-core-tests/src/test/java/org/apache/tajo/querymaster/TestTaskStatusUpdate.java tajo-common/src/main/proto/tajo_protos.proto tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultSystemScanner.java tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultFileScanner.java tajo-common/src/main/java/org/apache/tajo/tuple/BaseTupleBuilder.java tajo-common/src/main/proto/DataTypes.proto tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalPlanUtil.java tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenError.result tajo-common/src/main/java/org/apache/tajo/util/CompressionUtil.java tajo-core/src/main/java/org/apache/tajo/master/exec/QueryExecutor.java tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevSamp1.result tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestResultSet.java tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrue.result tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowBlockWriter.java tajo-client/src/main/java/org/apache/tajo/jdbc/TajoMemoryResultSet.java tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java tajo-client/src/main/java/org/apache/tajo/client/QueryClientImpl.java tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowWriter.java tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullFalse.result tajo-core/src/main/java/org/apache/tajo/master/TajoMasterClientService.java tajo-client/src/main/java/org/apache/tajo/client/TajoClientImpl.java tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultScanner.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #864 (See https://builds.apache.org/job/Tajo-master-build/864/)
          TAJO-1340: Change the default output file format. (jhkim: rev 8bbd51df292e4d74c95a54cb610754e1d6d77756)

          • tajo-client/src/main/java/org/apache/tajo/client/TajoClientUtil.java
          • tajo-client/src/main/java/org/apache/tajo/client/QueryClientImpl.java
          • tajo-common/src/main/java/org/apache/tajo/util/CompressionUtil.java
          • tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenErrorDeprecated.result
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullFalse.result
          • tajo-rpc/tajo-rpc-protobuf/src/main/java/org/apache/tajo/rpc/RpcClientManager.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenError.result
          • tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowBlockWriter.java
          • tajo-core/src/main/java/org/apache/tajo/master/TajoMasterClientService.java
          • tajo-common/src/main/proto/DataTypes.proto
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result
          • tajo-client/src/main/java/org/apache/tajo/client/SessionConnection.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
          • tajo-client/src/main/java/org/apache/tajo/jdbc/FetchResultSet.java
          • tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowWriter.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalPlanUtil.java
          • CHANGES
          • tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultSystemScanner.java
          • tajo-jdbc/src/main/java/org/apache/tajo/jdbc/JdbcConnection.java
          • tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultScanner.java
          • tajo-client/src/main/java/org/apache/tajo/jdbc/TajoResultSetBase.java
          • tajo-common/pom.xml
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          • tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result
          • tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestResultSet.java
          • tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevPop1.result
          • tajo-common/src/main/proto/tajo_protos.proto
          • tajo-plan/src/main/java/org/apache/tajo/plan/logical/PersistentStoreNode.java
          • tajo-client/src/main/java/org/apache/tajo/client/TajoClientImpl.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java
          • tajo-common/src/main/java/org/apache/tajo/SessionVars.java
          • tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultFileScanner.java
          • tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevSamp1.result
          • tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestTajoJdbc.java
          • tajo-cluster-tests/src/test/java/org/apache/tajo/TajoTestingCluster.java
          • tajo-client/src/main/java/org/apache/tajo/jdbc/TajoMemoryResultSet.java
          • tajo-client/src/main/proto/ClientProtos.proto
          • tajo-project/pom.xml
          • tajo-common/src/main/java/org/apache/tajo/tuple/BaseTupleBuilder.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrueDeprecated.result
          • tajo-core/src/main/java/org/apache/tajo/master/exec/QueryExecutor.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrue.result
          • tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java
          • tajo-core-tests/src/test/java/org/apache/tajo/client/TestQueryClientExceptions.java
          • tajo-core-tests/src/test/java/org/apache/tajo/querymaster/TestTaskStatusUpdate.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #864 (See https://builds.apache.org/job/Tajo-master-build/864/ ) TAJO-1340 : Change the default output file format. (jhkim: rev 8bbd51df292e4d74c95a54cb610754e1d6d77756) tajo-client/src/main/java/org/apache/tajo/client/TajoClientUtil.java tajo-client/src/main/java/org/apache/tajo/client/QueryClientImpl.java tajo-common/src/main/java/org/apache/tajo/util/CompressionUtil.java tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenErrorDeprecated.result tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullFalse.result tajo-rpc/tajo-rpc-protobuf/src/main/java/org/apache/tajo/rpc/RpcClientManager.java tajo-core/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenError.result tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowBlockWriter.java tajo-core/src/main/java/org/apache/tajo/master/TajoMasterClientService.java tajo-common/src/main/proto/DataTypes.proto tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result tajo-client/src/main/java/org/apache/tajo/client/SessionConnection.java tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java tajo-client/src/main/java/org/apache/tajo/jdbc/FetchResultSet.java tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowWriter.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalPlanUtil.java CHANGES tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultSystemScanner.java tajo-jdbc/src/main/java/org/apache/tajo/jdbc/JdbcConnection.java tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultScanner.java tajo-client/src/main/java/org/apache/tajo/jdbc/TajoResultSetBase.java tajo-common/pom.xml tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestResultSet.java tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevPop1.result tajo-common/src/main/proto/tajo_protos.proto tajo-plan/src/main/java/org/apache/tajo/plan/logical/PersistentStoreNode.java tajo-client/src/main/java/org/apache/tajo/client/TajoClientImpl.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java tajo-common/src/main/java/org/apache/tajo/SessionVars.java tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultFileScanner.java tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevSamp1.result tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestTajoJdbc.java tajo-cluster-tests/src/test/java/org/apache/tajo/TajoTestingCluster.java tajo-client/src/main/java/org/apache/tajo/jdbc/TajoMemoryResultSet.java tajo-client/src/main/proto/ClientProtos.proto tajo-project/pom.xml tajo-common/src/main/java/org/apache/tajo/tuple/BaseTupleBuilder.java tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrueDeprecated.result tajo-core/src/main/java/org/apache/tajo/master/exec/QueryExecutor.java tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrue.result tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java tajo-core-tests/src/test/java/org/apache/tajo/client/TestQueryClientExceptions.java tajo-core-tests/src/test/java/org/apache/tajo/querymaster/TestTaskStatusUpdate.java
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39366957

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) {
          // ResultSet ---------------------------------------------------------
          $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200),
          $RESULT_SET_BLOCK_WAIT("tajo.resultset.block.wait", true),
          + $RESULT_SET_COMPRESSION("tajo.resultset.compression", false),
          — End diff –

          I've change to `$COMPRESSED_RESULT_TRANSFER`
          Thanks

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39366957 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) { // ResultSet --------------------------------------------------------- $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200), $RESULT_SET_BLOCK_WAIT("tajo.resultset.block.wait", true), + $RESULT_SET_COMPRESSION("tajo.resultset.compression", false), — End diff – I've change to `$COMPRESSED_RESULT_TRANSFER` Thanks
          Hide
          jhkim Jinho Kim added a comment -

          committed it.
          Thanks Guys for the review!

          Show
          jhkim Jinho Kim added a comment - committed it. Thanks Guys for the review!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Tajo-master-CODEGEN-build #506 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/506/)
          TAJO-1340: Change the default output file format. (jhkim: rev 8bbd51df292e4d74c95a54cb610754e1d6d77756)

          • tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java
          • tajo-core-tests/src/test/java/org/apache/tajo/client/TestQueryClientExceptions.java
          • tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
          • tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestResultSet.java
          • tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultSystemScanner.java
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenErrorDeprecated.result
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrueDeprecated.result
          • tajo-common/src/main/java/org/apache/tajo/util/CompressionUtil.java
          • tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result
          • tajo-client/src/main/java/org/apache/tajo/client/QueryClientImpl.java
          • tajo-rpc/tajo-rpc-protobuf/src/main/java/org/apache/tajo/rpc/RpcClientManager.java
          • tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevSamp1.result
          • tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestTajoJdbc.java
          • tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevPop1.result
          • tajo-core-tests/src/test/java/org/apache/tajo/querymaster/TestTaskStatusUpdate.java
          • tajo-client/src/main/java/org/apache/tajo/jdbc/TajoMemoryResultSet.java
          • tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultScanner.java
          • tajo-common/src/main/proto/tajo_protos.proto
          • tajo-client/src/main/java/org/apache/tajo/jdbc/TajoResultSetBase.java
          • CHANGES
          • tajo-common/src/main/java/org/apache/tajo/SessionVars.java
          • tajo-core/src/main/java/org/apache/tajo/master/TajoMasterClientService.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullFalse.result
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalPlanUtil.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrue.result
          • tajo-project/pom.xml
          • tajo-client/src/main/java/org/apache/tajo/client/SessionConnection.java
          • tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowWriter.java
          • tajo-cluster-tests/src/test/java/org/apache/tajo/TajoTestingCluster.java
          • tajo-core/src/main/java/org/apache/tajo/master/exec/QueryExecutor.java
          • tajo-common/pom.xml
          • tajo-client/src/main/java/org/apache/tajo/client/TajoClientUtil.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java
          • tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenError.result
          • tajo-common/src/main/proto/DataTypes.proto
          • tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowBlockWriter.java
          • tajo-plan/src/main/java/org/apache/tajo/plan/logical/PersistentStoreNode.java
          • tajo-client/src/main/java/org/apache/tajo/jdbc/FetchResultSet.java
          • tajo-common/src/main/java/org/apache/tajo/tuple/BaseTupleBuilder.java
          • tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultFileScanner.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
          • tajo-client/src/main/java/org/apache/tajo/client/TajoClientImpl.java
          • tajo-jdbc/src/main/java/org/apache/tajo/jdbc/JdbcConnection.java
          • tajo-client/src/main/proto/ClientProtos.proto
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #506 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/506/ ) TAJO-1340 : Change the default output file format. (jhkim: rev 8bbd51df292e4d74c95a54cb610754e1d6d77756) tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java tajo-core-tests/src/test/java/org/apache/tajo/client/TestQueryClientExceptions.java tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestResultSet.java tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultSystemScanner.java tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenErrorDeprecated.result tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrueDeprecated.result tajo-common/src/main/java/org/apache/tajo/util/CompressionUtil.java tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result tajo-client/src/main/java/org/apache/tajo/client/QueryClientImpl.java tajo-rpc/tajo-rpc-protobuf/src/main/java/org/apache/tajo/rpc/RpcClientManager.java tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevSamp1.result tajo-jdbc/src/test/java/org/apache/tajo/jdbc/TestTajoJdbc.java tajo-core-tests/src/test/resources/results/TestWindowQuery/testStdDevPop1.result tajo-core-tests/src/test/java/org/apache/tajo/querymaster/TestTaskStatusUpdate.java tajo-client/src/main/java/org/apache/tajo/jdbc/TajoMemoryResultSet.java tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultScanner.java tajo-common/src/main/proto/tajo_protos.proto tajo-client/src/main/java/org/apache/tajo/jdbc/TajoResultSetBase.java CHANGES tajo-common/src/main/java/org/apache/tajo/SessionVars.java tajo-core/src/main/java/org/apache/tajo/master/TajoMasterClientService.java tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullFalse.result tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalPlanUtil.java tajo-core-tests/src/test/resources/results/TestTajoCli/testSelectResultWithNullTrue.result tajo-project/pom.xml tajo-client/src/main/java/org/apache/tajo/client/SessionConnection.java tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowWriter.java tajo-cluster-tests/src/test/java/org/apache/tajo/TajoTestingCluster.java tajo-core/src/main/java/org/apache/tajo/master/exec/QueryExecutor.java tajo-common/pom.xml tajo-client/src/main/java/org/apache/tajo/client/TajoClientUtil.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java tajo-core-tests/src/test/resources/results/TestTajoCli/testStopWhenError.result tajo-common/src/main/proto/DataTypes.proto tajo-common/src/main/java/org/apache/tajo/tuple/memory/OffHeapRowBlockWriter.java tajo-plan/src/main/java/org/apache/tajo/plan/logical/PersistentStoreNode.java tajo-client/src/main/java/org/apache/tajo/jdbc/FetchResultSet.java tajo-common/src/main/java/org/apache/tajo/tuple/BaseTupleBuilder.java tajo-core/src/main/java/org/apache/tajo/master/exec/NonForwardQueryResultFileScanner.java tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java tajo-client/src/main/java/org/apache/tajo/client/TajoClientImpl.java tajo-jdbc/src/main/java/org/apache/tajo/jdbc/JdbcConnection.java tajo-client/src/main/proto/ClientProtos.proto tajo-core/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tajo/pull/671

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/671
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-139965818

          Here is my +1. The patch looks great to me. I leaved one trivial comment. You can commit it after you reflect my comment if you agree.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-139965818 Here is my +1. The patch looks great to me. I leaved one trivial comment. You can commit it after you reflect my comment if you agree.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39362325

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) {
          // ResultSet ---------------------------------------------------------
          $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200),
          $RESULT_SET_BLOCK_WAIT("tajo.resultset.block.wait", true),
          + $RESULT_SET_COMPRESSION("tajo.resultset.compression", false),
          — End diff –

          It also should be ``$COMPRESSED_RESULT_TRANSFER`` for consistency of ``COMPRESSED_RESULT_TRANSFER`` in SessionVars.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39362325 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) { // ResultSet --------------------------------------------------------- $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200), $RESULT_SET_BLOCK_WAIT("tajo.resultset.block.wait", true), + $RESULT_SET_COMPRESSION("tajo.resultset.compression", false), — End diff – It also should be ``$COMPRESSED_RESULT_TRANSFER`` for consistency of ``COMPRESSED_RESULT_TRANSFER`` in SessionVars.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39362251

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) {
          // ResultSet ---------------------------------------------------------
          $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200),
          — End diff –

          Yes, you are right. Too big rows will cause OOM. If we find proper default row number, it will result in better compression ratio.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39362251 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) { // ResultSet --------------------------------------------------------- $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200), — End diff – Yes, you are right. Too big rows will cause OOM. If we find proper default row number, it will result in better compression ratio.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39346362

          — Diff: tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result —
          @@ -42,4 +42,5 @@ Available Session Variables:
          \set ARITHABORT [true or false] - If true, a running query will be terminated when an overflow or divide-by-zero occurs.
          \set FETCH_ROWNUM [int value] - Sets the number of rows at a time from Master
          \set BLOCK_ON_RESULT [true or false] - Whether to block result set on query execution
          +\set RESULT_COMPRESS [true or false] - Enable resultSet compression
          — End diff –

          Thanks for your detailed review
          "useCompression=true" looks great to me. also I’m going to change to COMPRESSED_RESULT_TRENSFER for session variable

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39346362 — Diff: tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result — @@ -42,4 +42,5 @@ Available Session Variables: \set ARITHABORT [true or false] - If true, a running query will be terminated when an overflow or divide-by-zero occurs. \set FETCH_ROWNUM [int value] - Sets the number of rows at a time from Master \set BLOCK_ON_RESULT [true or false] - Whether to block result set on query execution +\set RESULT_COMPRESS [true or false] - Enable resultSet compression — End diff – Thanks for your detailed review "useCompression=true" looks great to me. also I’m going to change to COMPRESSED_RESULT_TRENSFER for session variable
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39346360

          — Diff: tajo-common/src/main/proto/DataTypes.proto —
          @@ -114,3 +114,7 @@ message DataType

          { */ optional int32 num_nested_fields = 4; }

          +
          +enum CodecType {
          — End diff –

          I'll move CodecType to `tajo_protos.proto`

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39346360 — Diff: tajo-common/src/main/proto/DataTypes.proto — @@ -114,3 +114,7 @@ message DataType { */ optional int32 num_nested_fields = 4; } + +enum CodecType { — End diff – I'll move CodecType to `tajo_protos.proto`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39346359

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) {
          // ResultSet ---------------------------------------------------------
          $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200),
          — End diff –

          @hyunsik
          I think it should be a size. If a row size is too big, probably it can be OOM

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39346359 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) { // ResultSet --------------------------------------------------------- $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200), — End diff – @hyunsik I think it should be a size. If a row size is too big, probably it can be OOM
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-139540315

          Excellent work! I leaved some comments.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-139540315 Excellent work! I leaved some comments.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39268081

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -215,6 +216,9 @@ public static int setDateOrder(int dateOrder) {
          SHUFFLE_HASH_APPENDER_PAGE_VOLUME("tajo.shuffle.hash.appender.page.volumn-mb", 30),
          HASH_SHUFFLE_PARENT_DIRS("tajo.hash.shuffle.parent.dirs.count", 10),

          + // Final output Configuration --------------------------------------------------
          + FINAL_OUTPUT_FILE_FORMAT("tajo.final.output.file-format", BuiltinStorages.TEXT, Validators.javaString()),
          — End diff –

          Each key level should be meaningful, and left keys should be more general than those of right ones.
          I'd like to recommend ``DEFAULT_OUTPUT_FILE_FORMAT`` and ``tajo.output.file-format``.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39268081 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -215,6 +216,9 @@ public static int setDateOrder(int dateOrder) { SHUFFLE_HASH_APPENDER_PAGE_VOLUME("tajo.shuffle.hash.appender.page.volumn-mb", 30), HASH_SHUFFLE_PARENT_DIRS("tajo.hash.shuffle.parent.dirs.count", 10), + // Final output Configuration -------------------------------------------------- + FINAL_OUTPUT_FILE_FORMAT("tajo.final.output.file-format", BuiltinStorages.TEXT, Validators.javaString()), — End diff – Each key level should be meaningful, and left keys should be more general than those of right ones. I'd like to recommend ``DEFAULT_OUTPUT_FILE_FORMAT`` and ``tajo.output.file-format``.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39267724

          — Diff: tajo-client/src/main/java/org/apache/tajo/jdbc/TajoMemoryResultSet.java —
          @@ -18,46 +18,78 @@

          package org.apache.tajo.jdbc;

          -import com.google.protobuf.ByteString;
          +import io.netty.buffer.Unpooled;
          import org.apache.tajo.QueryId;
          import org.apache.tajo.catalog.Schema;
          -import org.apache.tajo.storage.RowStoreUtil;
          +import org.apache.tajo.catalog.SchemaUtil;
          +import org.apache.tajo.exception.TajoInternalError;
          +import org.apache.tajo.ipc.ClientProtos.SerializedResultSet;
          import org.apache.tajo.storage.Tuple;
          +import org.apache.tajo.tuple.RowBlockReader;
          +import org.apache.tajo.tuple.memory.HeapRowBlockReader;
          +import org.apache.tajo.tuple.memory.HeapTuple;
          +import org.apache.tajo.tuple.memory.MemoryBlock;
          +import org.apache.tajo.tuple.memory.ResizableMemoryBlock;
          +import org.apache.tajo.util.CompressionUtil;

          import java.io.IOException;
          import java.sql.SQLException;
          -import java.util.List;
          import java.util.Map;
          -import java.util.concurrent.atomic.AtomicBoolean;

          public class TajoMemoryResultSet extends TajoResultSetBase {

          • private List<ByteString> serializedTuples;
          • private AtomicBoolean closed = new AtomicBoolean(false);
          • private RowStoreUtil.RowStoreDecoder decoder;
            + private MemoryBlock memory;
            + private RowBlockReader reader;
            + private volatile boolean closed;
          • public TajoMemoryResultSet(QueryId queryId, Schema schema, List<ByteString> serializedTuples, int maxRowNum,
            +
            + public TajoMemoryResultSet(QueryId queryId, Schema schema, SerializedResultSet resultSet,
            Map<String, String> clientSideSessionVars) {
            super(queryId, schema, clientSideSessionVars);
          • this.totalRow = maxRowNum;
          • this.serializedTuples = serializedTuples;
          • this.decoder = RowStoreUtil.createDecoder(schema);
            + if(resultSet != null && resultSet.getRows() > 0) {
            + this.totalRow = resultSet.getRows();
            +
            + try {
            + // decompress if has a codec
              • End diff –

          It should be ``decompress if it has a codec`` or ``decompress if a codec is specified``.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39267724 — Diff: tajo-client/src/main/java/org/apache/tajo/jdbc/TajoMemoryResultSet.java — @@ -18,46 +18,78 @@ package org.apache.tajo.jdbc; -import com.google.protobuf.ByteString; +import io.netty.buffer.Unpooled; import org.apache.tajo.QueryId; import org.apache.tajo.catalog.Schema; -import org.apache.tajo.storage.RowStoreUtil; +import org.apache.tajo.catalog.SchemaUtil; +import org.apache.tajo.exception.TajoInternalError; +import org.apache.tajo.ipc.ClientProtos.SerializedResultSet; import org.apache.tajo.storage.Tuple; +import org.apache.tajo.tuple.RowBlockReader; +import org.apache.tajo.tuple.memory.HeapRowBlockReader; +import org.apache.tajo.tuple.memory.HeapTuple; +import org.apache.tajo.tuple.memory.MemoryBlock; +import org.apache.tajo.tuple.memory.ResizableMemoryBlock; +import org.apache.tajo.util.CompressionUtil; import java.io.IOException; import java.sql.SQLException; -import java.util.List; import java.util.Map; -import java.util.concurrent.atomic.AtomicBoolean; public class TajoMemoryResultSet extends TajoResultSetBase { private List<ByteString> serializedTuples; private AtomicBoolean closed = new AtomicBoolean(false); private RowStoreUtil.RowStoreDecoder decoder; + private MemoryBlock memory; + private RowBlockReader reader; + private volatile boolean closed; public TajoMemoryResultSet(QueryId queryId, Schema schema, List<ByteString> serializedTuples, int maxRowNum, + + public TajoMemoryResultSet(QueryId queryId, Schema schema, SerializedResultSet resultSet, Map<String, String> clientSideSessionVars) { super(queryId, schema, clientSideSessionVars); this.totalRow = maxRowNum; this.serializedTuples = serializedTuples; this.decoder = RowStoreUtil.createDecoder(schema); + if(resultSet != null && resultSet.getRows() > 0) { + this.totalRow = resultSet.getRows(); + + try { + // decompress if has a codec End diff – It should be ``decompress if it has a codec`` or ``decompress if a codec is specified``.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39267261

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) {
          // ResultSet ---------------------------------------------------------
          $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200),
          — End diff –

          When we use compression, ``200`` may be too small to get some performance benefits from compression. Later, we may need some experiments to find the best row number for compression or non-compression mode.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39267261 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) { // ResultSet --------------------------------------------------------- $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200), — End diff – When we use compression, ``200`` may be too small to get some performance benefits from compression. Later, we may need some experiments to find the best row number for compression or non-compression mode.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39266835

          — Diff: tajo-common/src/main/proto/DataTypes.proto —
          @@ -114,3 +114,7 @@ message DataType

          { */ optional int32 num_nested_fields = 4; }

          +
          +enum CodecType {
          — End diff –

          DataTypes.proto seems to be not proper for CodecType, which is used for client communication.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39266835 — Diff: tajo-common/src/main/proto/DataTypes.proto — @@ -114,3 +114,7 @@ message DataType { */ optional int32 num_nested_fields = 4; } + +enum CodecType { — End diff – DataTypes.proto seems to be not proper for CodecType, which is used for client communication.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39266733

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java —
          @@ -198,7 +201,34 @@ public void init() throws IOException {
          // for non-projected fields.
          Schema actualInSchema = scanner.isProjectable() ? projectedFields : inSchema;

          • this.projector = new Projector(context, actualInSchema, outSchema, plan.getTargets());
            + Target[] realTargets;
              • End diff –

          It would be better if this change is extracted to one separate method like ``initializeProjector``.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39266733 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java — @@ -198,7 +201,34 @@ public void init() throws IOException { // for non-projected fields. Schema actualInSchema = scanner.isProjectable() ? projectedFields : inSchema; this.projector = new Projector(context, actualInSchema, outSchema, plan.getTargets()); + Target[] realTargets; End diff – It would be better if this change is extracted to one separate method like ``initializeProjector``.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39265815

          — Diff: tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result —
          @@ -42,4 +42,5 @@ Available Session Variables:
          \set ARITHABORT [true or false] - If true, a running query will be terminated when an overflow or divide-by-zero occurs.
          \set FETCH_ROWNUM [int value] - Sets the number of rows at a time from Master
          \set BLOCK_ON_RESULT [true or false] - Whether to block result set on query execution
          +\set RESULT_COMPRESS [true or false] - Enable resultSet compression
          — End diff –

          I have some comments.

          • ``RESULT_COMPRESSION`` would be better in terms of naming convention.
          • Probably, ``RESULT_COMPRESS`` and its description may seem to enable the compressed query result.
          • Actually, this feature compresses the transfer stream for ResultSet. So, I'd like to suggest the description as ``Use compression to optimize result transmission."
          • Also, I'd like to suggest this key ``COMPRESSED_RESULT_TRENSFER`` for session variable. It's clear, but it is somewhat longer than yours. It's up to you.

          Also, I'm going to improve JDBC URL to take defined options instead of taking directly session variables. For compressed data transfer, I'll add ``?useCompression=true`` like MySQL.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39265815 — Diff: tajo-core-tests/src/test/resources/results/TestTajoCli/testHelpSessionVars.result — @@ -42,4 +42,5 @@ Available Session Variables: \set ARITHABORT [true or false] - If true, a running query will be terminated when an overflow or divide-by-zero occurs. \set FETCH_ROWNUM [int value] - Sets the number of rows at a time from Master \set BLOCK_ON_RESULT [true or false] - Whether to block result set on query execution +\set RESULT_COMPRESS [true or false] - Enable resultSet compression — End diff – I have some comments. ``RESULT_COMPRESSION`` would be better in terms of naming convention. Probably, ``RESULT_COMPRESS`` and its description may seem to enable the compressed query result. Actually, this feature compresses the transfer stream for ResultSet. So, I'd like to suggest the description as ``Use compression to optimize result transmission." Also, I'd like to suggest this key ``COMPRESSED_RESULT_TRENSFER`` for session variable. It's clear, but it is somewhat longer than yours. It's up to you. Also, I'm going to improve JDBC URL to take defined options instead of taking directly session variables. For compressed data transfer, I'll add ``?useCompression=true`` like MySQL.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39264003

          — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
          @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) {
          // ResultSet ---------------------------------------------------------
          $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200),
          $RESULT_SET_BLOCK_WAIT("tajo.resultset.block.wait", true),
          + $RESULT_SET_COMPRESS("tajo.resultset.compress", false),
          — End diff –

          How about ``tajo.resultset.compression`` ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39264003 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -377,6 +381,7 @@ public static int setDateOrder(int dateOrder) { // ResultSet --------------------------------------------------------- $RESULT_SET_FETCH_ROWNUM("tajo.resultset.fetch.rownum", 200), $RESULT_SET_BLOCK_WAIT("tajo.resultset.block.wait", true), + $RESULT_SET_COMPRESS("tajo.resultset.compress", false), — End diff – How about ``tajo.resultset.compression`` ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39263904

          — Diff: tajo-common/src/main/java/org/apache/tajo/SessionVars.java —
          @@ -149,6 +149,8 @@
          Integer.class, Validators.min("0")),
          BLOCK_ON_RESULT(ConfVars.$RESULT_SET_BLOCK_WAIT, "Whether to block result set on query execution", DEFAULT,
          Boolean.class, Validators.bool()),
          + RESULT_COMPRESS(ConfVars.$RESULT_SET_COMPRESS, "Enable resultSet compression", CLI_SIDE_VAR,
          — End diff –

          How about renaming it to RESULT_COMPRESSION?

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39263904 — Diff: tajo-common/src/main/java/org/apache/tajo/SessionVars.java — @@ -149,6 +149,8 @@ Integer.class, Validators.min("0")), BLOCK_ON_RESULT(ConfVars.$RESULT_SET_BLOCK_WAIT, "Whether to block result set on query execution", DEFAULT, Boolean.class, Validators.bool()), + RESULT_COMPRESS(ConfVars.$RESULT_SET_COMPRESS, "Enable resultSet compression", CLI_SIDE_VAR, — End diff – How about renaming it to RESULT_COMPRESSION?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39242506

          — Diff: tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java —
          @@ -106,7 +107,7 @@

          GetQueryResultResponse getResultResponse(QueryId queryId) throws TajoException;

          • TajoMemoryResultSet fetchNextQueryResult(final QueryId queryId, final int fetchRowNum) throws TajoException;
            + Future<TajoMemoryResultSet> asyncFetchNextQueryResult(final QueryId queryId, final int fetchRowNum);
              • End diff –

          You're right. Thanks

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39242506 — Diff: tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java — @@ -106,7 +107,7 @@ GetQueryResultResponse getResultResponse(QueryId queryId) throws TajoException; TajoMemoryResultSet fetchNextQueryResult(final QueryId queryId, final int fetchRowNum) throws TajoException; + Future<TajoMemoryResultSet> asyncFetchNextQueryResult(final QueryId queryId, final int fetchRowNum); End diff – You're right. Thanks
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39133720

          — Diff: tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result —
          @@ -1,3 +1,3 @@
          ?sum,?sum_1
          -------------------------------
          -3,414440.9
          \ No newline at end of file
          +3,414440.89999999997
          — End diff –

          I tested in IDE debug tool
          The precision is changed in SumContext and OffheapWriter keeps number with precision.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39133720 — Diff: tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result — @@ -1,3 +1,3 @@ ?sum,?sum_1 ------------------------------- -3,414440.9 \ No newline at end of file +3,414440.89999999997 — End diff – I tested in IDE debug tool The precision is changed in SumContext and OffheapWriter keeps number with precision.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39130362

          — Diff: tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result —
          @@ -1,3 +1,3 @@
          ?sum,?sum_1
          -------------------------------
          -3,414440.9
          \ No newline at end of file
          +3,414440.89999999997
          — End diff –

          What is the reason why different precision is used here?

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39130362 — Diff: tajo-core-tests/src/test/resources/results/TestGroupByQuery/testDistinctAggregation_case10.result — @@ -1,3 +1,3 @@ ?sum,?sum_1 ------------------------------- -3,414440.9 \ No newline at end of file +3,414440.89999999997 — End diff – What is the reason why different precision is used here?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39130216

          — Diff: tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java —
          @@ -106,7 +107,7 @@

          GetQueryResultResponse getResultResponse(QueryId queryId) throws TajoException;

          • TajoMemoryResultSet fetchNextQueryResult(final QueryId queryId, final int fetchRowNum) throws TajoException;
            + Future<TajoMemoryResultSet> asyncFetchNextQueryResult(final QueryId queryId, final int fetchRowNum);
              • End diff –

          How about renaming it into ``fetchNextQueryResultAsync()``?

          It would be more consistent for a method name, starting a verb. So, the method's full meaning will be 'fetch the next query result asynchronously'.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39130216 — Diff: tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java — @@ -106,7 +107,7 @@ GetQueryResultResponse getResultResponse(QueryId queryId) throws TajoException; TajoMemoryResultSet fetchNextQueryResult(final QueryId queryId, final int fetchRowNum) throws TajoException; + Future<TajoMemoryResultSet> asyncFetchNextQueryResult(final QueryId queryId, final int fetchRowNum); End diff – How about renaming it into ``fetchNextQueryResultAsync()``? It would be more consistent for a method name, starting a verb. So, the method's full meaning will be 'fetch the next query result asynchronously'.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-139097346

          @jihoonson
          I've update the patch that reflects your comments

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-139097346 @jihoonson I've update the patch that reflects your comments
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-138913514

          Nice work! I left some trivial comments.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-138913514 Nice work! I left some trivial comments.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39043189

          — Diff: tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java —
          @@ -108,6 +109,8 @@

          TajoMemoryResultSet fetchNextQueryResult(final QueryId queryId, final int fetchRowNum) throws TajoException;
          — End diff –

          It looks that this method doesn't need to be exposed as a client interface.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39043189 — Diff: tajo-client/src/main/java/org/apache/tajo/client/QueryClient.java — @@ -108,6 +109,8 @@ TajoMemoryResultSet fetchNextQueryResult(final QueryId queryId, final int fetchRowNum) throws TajoException; — End diff – It looks that this method doesn't need to be exposed as a client interface.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r39039858

          — Diff: tajo-client/src/main/java/org/apache/tajo/jdbc/FetchResultSet.java —
          @@ -25,11 +25,13 @@

          import java.io.IOException;
          import java.sql.SQLException;
          +import java.util.concurrent.Future;

          public class FetchResultSet extends TajoResultSetBase {
          protected QueryClient tajoClient;
          private int fetchRowNum;
          private TajoMemoryResultSet currentResultSet;
          + private Future<TajoMemoryResultSet> nextResultSet;
          — End diff –

          It looks that ```nextResultSet``` should also be closed when FetchResultSet.close() is called.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r39039858 — Diff: tajo-client/src/main/java/org/apache/tajo/jdbc/FetchResultSet.java — @@ -25,11 +25,13 @@ import java.io.IOException; import java.sql.SQLException; +import java.util.concurrent.Future; public class FetchResultSet extends TajoResultSetBase { protected QueryClient tajoClient; private int fetchRowNum; private TajoMemoryResultSet currentResultSet; + private Future<TajoMemoryResultSet> nextResultSet; — End diff – It looks that ```nextResultSet``` should also be closed when FetchResultSet.close() is called.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-138853116

          I've started reviewing this patch.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-138853116 I've started reviewing this patch.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-138227143

          This PR is ready for review

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-138227143 This PR is ready for review
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-138226623

          Change to pre-fetch both TajoMaster and TajoClient

          Pre-Fetch, Network: Loopback, Data: lineitem 3GB
          ```
          TEXT format
          52 sec (Deserialize), 52 sec (Non-Deserialize)

          TEXT format + snappy compression
          54 sec (Deserialize), 54 sec (Non-Deserialize)

          DRAW format
          32 sec (Deserialize), 22 sec (Non-Deserialize)

          DRAW format + snappy compression
          47 sec (Deserialize), 47 sec (Non-Deserialize)
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-138226623 Change to pre-fetch both TajoMaster and TajoClient Pre-Fetch, Network: Loopback, Data: lineitem 3GB ``` TEXT format 52 sec (Deserialize), 52 sec (Non-Deserialize) TEXT format + snappy compression 54 sec (Deserialize), 54 sec (Non-Deserialize) DRAW format 32 sec (Deserialize), 22 sec (Non-Deserialize) DRAW format + snappy compression 47 sec (Deserialize), 47 sec (Non-Deserialize) ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-137692374

          I've add async fetch.
          Here is some benchmark
          ```
          //Query: select * from lineitem
          TajoMaster: text + TajoClient: async-fetch
          81 sec (Deserialize), 76 sec (Non-Deserialize)

          TajoMaster: text compress + TajoClient: async-fetch , decompress
          75 sec (Deserialize), 76 sec (Non-Deserialize)

          //Query: select * from lineitem where 1=1
          TajoMaster: draw + TajoClient: async-fetch
          35 sec (Deserialize), 25 sec (Non-Deserialize)

          TajoMaster: draw, compress + TajoClient: async-fetch, decompress
          53 sec (Deserialize), 51 sec (Non-Deserialize)
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-137692374 I've add async fetch. Here is some benchmark ``` //Query: select * from lineitem TajoMaster: text + TajoClient: async-fetch 81 sec (Deserialize), 76 sec (Non-Deserialize) TajoMaster: text compress + TajoClient: async-fetch , decompress 75 sec (Deserialize), 76 sec (Non-Deserialize) //Query: select * from lineitem where 1=1 TajoMaster: draw + TajoClient: async-fetch 35 sec (Deserialize), 25 sec (Non-Deserialize) TajoMaster: draw, compress + TajoClient: async-fetch, decompress 53 sec (Deserialize), 51 sec (Non-Deserialize) ```
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12753673/TAJO-1340.patch
          against master revision release-0.9.0-rc0-434-g2c9305a.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 20 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to cause Findbugs (version 2.0.3) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in tajo-cli tajo-client tajo-cluster-tests tajo-common tajo-core tajo-core-tests tajo-jdbc tajo-plan tajo-storage/tajo-storage-common tajo-storage/tajo-storage-hbase tajo-storage/tajo-storage-hdfs:
          org.apache.tajo.tuple.memory.TestHeapTuple
          org.apache.tajo.tuple.TestBaseTupleBuilder
          org.apache.tajo.tuple.memory.TestMemoryRowBlock
          org.apache.tajo.engine.query.TestInSubquery
          org.apache.tajo.jdbc.TestTajoDatabaseMetaData
          org.apache.tajo.jdbc.TestTajoJdbc
          org.apache.tajo.jdbc.TestResultSet

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/829//testReport/
          Findbugs results: https://builds.apache.org/job/PreCommit-TAJO-Build/829//findbugsResult
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/829//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12753673/TAJO-1340.patch against master revision release-0.9.0-rc0-434-g2c9305a. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to cause Findbugs (version 2.0.3) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in tajo-cli tajo-client tajo-cluster-tests tajo-common tajo-core tajo-core-tests tajo-jdbc tajo-plan tajo-storage/tajo-storage-common tajo-storage/tajo-storage-hbase tajo-storage/tajo-storage-hdfs: org.apache.tajo.tuple.memory.TestHeapTuple org.apache.tajo.tuple.TestBaseTupleBuilder org.apache.tajo.tuple.memory.TestMemoryRowBlock org.apache.tajo.engine.query.TestInSubquery org.apache.tajo.jdbc.TestTajoDatabaseMetaData org.apache.tajo.jdbc.TestTajoJdbc org.apache.tajo.jdbc.TestResultSet Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/829//testReport/ Findbugs results: https://builds.apache.org/job/PreCommit-TAJO-Build/829//findbugsResult Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/829//console This message is automatically generated.
          Hide
          jhkim Jinho Kim added a comment -

          This patch contains TAJO-1738

          Show
          jhkim Jinho Kim added a comment - This patch contains TAJO-1738
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-136745852

          I've tested JDBC performance on laptop because my testing network environment is 1Gbps
          TPC-H scale-3 lineitem
          Query: select * from lineitem where 1=1

          Before
          ```
          Serialize per row + Text
          107 sec //avg 30MByte/sec
          ```
          After
          ```
          Serialize Row-Block + Text
          80 sec
          Serialize Row-Block + DRAW
          30 sec
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-136745852 I've tested JDBC performance on laptop because my testing network environment is 1Gbps TPC-H scale-3 lineitem Query: select * from lineitem where 1=1 Before ``` Serialize per row + Text 107 sec //avg 30MByte/sec ``` After ``` Serialize Row-Block + Text 80 sec Serialize Row-Block + DRAW 30 sec ```
          Hide
          jhkim Jinho Kim added a comment -

          Here is my hang log. I will remove the stream seek in TAJO-1738

          587 "TajoMasterClientProtocol-3 Server Worker #1" #81 prio=5 os_prio=0 tid=0x00007f93c4002000 nid=0x72ee runnable [0x00007f93a42f3000]
           588    java.lang.Thread.State: RUNNABLE
           589         at sun.nio.ch.NativeThread.current(Native Method)
           590         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
           591         - locked <0x00000000cc243248> (a java.lang.Object)
           592         - locked <0x00000000cc243238> (a java.lang.Object)
           593         at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
           594         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
           595         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
           596         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
           597         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
           598         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
           599         - locked <0x00000000f6f0e3b8> (a java.io.BufferedOutputStream)
           600         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
           601         at org.apache.hadoop.hdfs.protocol.datatransfer.Sender.send(Sender.java:82)
           602         at org.apache.hadoop.hdfs.protocol.datatransfer.Sender.readBlock(Sender.java:113)
           603         at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:414)
           604         at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
           605         at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
           606         at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
           607         at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
           608         - locked <0x00000000c3a36b50> (a org.apache.hadoop.hdfs.DFSInputStream)
           609         at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
           610         - locked <0x00000000c3a36b50> (a org.apache.hadoop.hdfs.DFSInputStream)
           611         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:908)
           612         - locked <0x00000000c3a36b50> (a org.apache.hadoop.hdfs.DFSInputStream)
           613         at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
           614         at org.apache.tajo.storage.FSDataInputChannel.read(FSDataInputChannel.java:54)
           615         at org.apache.tajo.tuple.offheap.OffHeapRowBlock.copyFromChannel(OffHeapRowBlock.java:141)
           616         at org.apache.tajo.storage.rawfile.DirectRawFileScanner.next(DirectRawFileScanner.java:123)
           617         at org.apache.tajo.storage.rawfile.DirectRawFileScanner.next(DirectRawFileScanner.java:136)
           618         at org.apache.tajo.storage.MergeScanner.next(MergeScanner.java:103)
           619         at org.apache.tajo.engine.planner.physical.FullScanIterator.hasNext(FullScanIterator.java:39)
           620         at org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:249)
           621         at org.apache.tajo.master.exec.NonForwardQueryResultFileScanner.getNextRows(NonForwardQueryResultFileScanner.java:162)
           622         at org.apache.tajo.master.TajoMasterClientService$TajoMasterClientProtocolServiceHandler.getQueryResultData(TajoMasterClientService.java:566)
          
          Show
          jhkim Jinho Kim added a comment - Here is my hang log. I will remove the stream seek in TAJO-1738 587 "TajoMasterClientProtocol-3 Server Worker #1" #81 prio=5 os_prio=0 tid=0x00007f93c4002000 nid=0x72ee runnable [0x00007f93a42f3000] 588 java.lang.Thread.State: RUNNABLE 589 at sun.nio.ch.NativeThread.current(Native Method) 590 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468) 591 - locked <0x00000000cc243248> (a java.lang.Object) 592 - locked <0x00000000cc243238> (a java.lang.Object) 593 at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) 594 at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) 595 at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) 596 at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) 597 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 598 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 599 - locked <0x00000000f6f0e3b8> (a java.io.BufferedOutputStream) 600 at java.io.DataOutputStream.flush(DataOutputStream.java:123) 601 at org.apache.hadoop.hdfs.protocol.datatransfer.Sender.send(Sender.java:82) 602 at org.apache.hadoop.hdfs.protocol.datatransfer.Sender.readBlock(Sender.java:113) 603 at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:414) 604 at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818) 605 at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697) 606 at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355) 607 at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618) 608 - locked <0x00000000c3a36b50> (a org.apache.hadoop.hdfs.DFSInputStream) 609 at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844) 610 - locked <0x00000000c3a36b50> (a org.apache.hadoop.hdfs.DFSInputStream) 611 at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:908) 612 - locked <0x00000000c3a36b50> (a org.apache.hadoop.hdfs.DFSInputStream) 613 at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143) 614 at org.apache.tajo.storage.FSDataInputChannel.read(FSDataInputChannel.java:54) 615 at org.apache.tajo.tuple.offheap.OffHeapRowBlock.copyFromChannel(OffHeapRowBlock.java:141) 616 at org.apache.tajo.storage.rawfile.DirectRawFileScanner.next(DirectRawFileScanner.java:123) 617 at org.apache.tajo.storage.rawfile.DirectRawFileScanner.next(DirectRawFileScanner.java:136) 618 at org.apache.tajo.storage.MergeScanner.next(MergeScanner.java:103) 619 at org.apache.tajo.engine.planner.physical.FullScanIterator.hasNext(FullScanIterator.java:39) 620 at org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:249) 621 at org.apache.tajo.master.exec.NonForwardQueryResultFileScanner.getNextRows(NonForwardQueryResultFileScanner.java:162) 622 at org.apache.tajo.master.TajoMasterClientService$TajoMasterClientProtocolServiceHandler.getQueryResultData(TajoMasterClientService.java:566)
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r37158579

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java —
          @@ -991,7 +992,7 @@ public LogicalNode visitProjection(GlobalPlanContext context, LogicalPlan plan,
          for (DataChannel dataChannel : masterPlan.getIncomingChannels(execBlock.getId())) {
          // This data channel will be stored in staging directory, but RawFile, default file type, does not support
          // distributed file system. It needs to change the file format for distributed file system.

          • dataChannel.setStoreType("TEXT");
            + dataChannel.setStoreType(BuiltinStorages.DRAW);
              • End diff –

          I think that adding FINAL_OUTPUT_FORMAT in TajoConf is better.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r37158579 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java — @@ -991,7 +992,7 @@ public LogicalNode visitProjection(GlobalPlanContext context, LogicalPlan plan, for (DataChannel dataChannel : masterPlan.getIncomingChannels(execBlock.getId())) { // This data channel will be stored in staging directory, but RawFile, default file type, does not support // distributed file system. It needs to change the file format for distributed file system. dataChannel.setStoreType("TEXT"); + dataChannel.setStoreType(BuiltinStorages.DRAW); End diff – I think that adding FINAL_OUTPUT_FORMAT in TajoConf is better.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-131678831

          Thanks Guys for the review!
          But I found hangs in OffheapBlock. I am investigating the hangs in TAJO-1738
          If TAJO-1738 is ready, I will do rebase

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-131678831 Thanks Guys for the review! But I found hangs in OffheapBlock. I am investigating the hangs in TAJO-1738 If TAJO-1738 is ready, I will do rebase
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-131674311

          It needs rebase.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-131674311 It needs rebase.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/671#issuecomment-129378018

          @jinossy, this patch looks good to me, but it causes some minor conflicts against master. Please fix it.
          In addition, I left a very trivial comment. If you agree, please consider it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/671#issuecomment-129378018 @jinossy, this patch looks good to me, but it causes some minor conflicts against master. Please fix it. In addition, I left a very trivial comment. If you agree, please consider it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r36613979

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java —
          @@ -991,7 +992,7 @@ public LogicalNode visitProjection(GlobalPlanContext context, LogicalPlan plan,
          for (DataChannel dataChannel : masterPlan.getIncomingChannels(execBlock.getId())) {
          // This data channel will be stored in staging directory, but RawFile, default file type, does not support
          // distributed file system. It needs to change the file format for distributed file system.

          • dataChannel.setStoreType("TEXT");
            + dataChannel.setStoreType(BuiltinStorages.DRAW);
              • End diff –

          How about adding a final variable for the default store type and using it in the class?

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r36613979 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java — @@ -991,7 +992,7 @@ public LogicalNode visitProjection(GlobalPlanContext context, LogicalPlan plan, for (DataChannel dataChannel : masterPlan.getIncomingChannels(execBlock.getId())) { // This data channel will be stored in staging directory, but RawFile, default file type, does not support // distributed file system. It needs to change the file format for distributed file system. dataChannel.setStoreType("TEXT"); + dataChannel.setStoreType(BuiltinStorages.DRAW); End diff – How about adding a final variable for the default store type and using it in the class?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/671#discussion_r36613977

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java —
          @@ -163,7 +164,7 @@ public void build(QueryContext queryContext, MasterPlan masterPlan) throws IOExc
          private static void setFinalOutputChannel(DataChannel outputChannel, Schema outputSchema) {
          outputChannel.setShuffleType(NONE_SHUFFLE);
          outputChannel.setShuffleOutputNum(1);

          • outputChannel.setStoreType("TEXT");
            + outputChannel.setStoreType(BuiltinStorages.DRAW);
              • End diff –

          How about adding a final variable for the default store type and using it in the class?

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/671#discussion_r36613977 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java — @@ -163,7 +164,7 @@ public void build(QueryContext queryContext, MasterPlan masterPlan) throws IOExc private static void setFinalOutputChannel(DataChannel outputChannel, Schema outputSchema) { outputChannel.setShuffleType(NONE_SHUFFLE); outputChannel.setShuffleOutputNum(1); outputChannel.setStoreType("TEXT"); + outputChannel.setStoreType(BuiltinStorages.DRAW); End diff – How about adding a final variable for the default store type and using it in the class?
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user jinossy opened a pull request:

          https://github.com/apache/tajo/pull/671

          TAJO-1340: Change the default output file format.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/jinossy/tajo TAJO-1340

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/671.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #671


          commit 02751701715b2e46f74bb2a1519be02d2b3db738
          Author: Jinho Kim <jhkim@apache.org>
          Date: 2015-07-30T12:48:06Z

          TAJO-1340

          commit 42928ef20ba1686be5a5efee54216c3de1fe92cc
          Author: Jinho Kim <jhkim@apache.org>
          Date: 2015-07-31T02:36:56Z

          fix invalid codes and tests

          commit d2bea94148511eb91483e0b465d6a4d2eb4ec282
          Author: Jinho Kim <jhkim@apache.org>
          Date: 2015-07-31T02:38:13Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1340

          commit f59e1434fc8d02173fd2cdbeee2941880bb6a5aa
          Author: Jinho Kim <jhkim@apache.org>
          Date: 2015-07-31T03:22:25Z

          fix bug and add more test case


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user jinossy opened a pull request: https://github.com/apache/tajo/pull/671 TAJO-1340 : Change the default output file format. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinossy/tajo TAJO-1340 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/671.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #671 commit 02751701715b2e46f74bb2a1519be02d2b3db738 Author: Jinho Kim <jhkim@apache.org> Date: 2015-07-30T12:48:06Z TAJO-1340 commit 42928ef20ba1686be5a5efee54216c3de1fe92cc Author: Jinho Kim <jhkim@apache.org> Date: 2015-07-31T02:36:56Z fix invalid codes and tests commit d2bea94148511eb91483e0b465d6a4d2eb4ec282 Author: Jinho Kim <jhkim@apache.org> Date: 2015-07-31T02:38:13Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1340 commit f59e1434fc8d02173fd2cdbeee2941880bb6a5aa Author: Jinho Kim <jhkim@apache.org> Date: 2015-07-31T03:22:25Z fix bug and add more test case
          Hide
          jhkim Jinho Kim added a comment -

          Hi folks
          I'd like to propose to DirectRawFile(TAJO-1273)
          But OffHeapRowBlock need refactoring for more safe. So I’m going to add PooledByteBufRowBlock

          Show
          jhkim Jinho Kim added a comment - Hi folks I'd like to propose to DirectRawFile( TAJO-1273 ) But OffHeapRowBlock need refactoring for more safe. So I’m going to add PooledByteBufRowBlock
          Hide
          hyunsik Hyunsik Choi added a comment -

          I totally agree with you. Changing the output file format may be an essential solution of TAJO-1339.

          Show
          hyunsik Hyunsik Choi added a comment - I totally agree with you. Changing the output file format may be an essential solution of TAJO-1339 .
          Hide
          sirpkt Keuntae Park added a comment - - edited

          It is great idea !
          Relating with TAJO-1339, It is vey annoying that
          when two tables with different delimiters are used,
          how to determine the delimiter of resulting data
          because there may be no delimiter that can be applied to both tables.
          (For example, one table uses delimiter of ',' and has data including '|' while the other uses delimiter of '|' and has data including ',')

          Show
          sirpkt Keuntae Park added a comment - - edited It is great idea ! Relating with TAJO-1339 , It is vey annoying that when two tables with different delimiters are used, how to determine the delimiter of resulting data because there may be no delimiter that can be applied to both tables. (For example, one table uses delimiter of ',' and has data including '|' while the other uses delimiter of '|' and has data including ',')

            People

            • Assignee:
              jhkim Jinho Kim
              Reporter:
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development