Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-69

Add batch option support in orc-contents and orc-scan tools.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: tools
    • Labels:
      None

      Description

      The batchSize in FileScan.cc and FileContents.cc is hard coded with 1000. I add option named --batch to support input batchSize from command line.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user xunzhang opened a pull request:

          https://github.com/apache/orc/pull/38

          ORC-69. Add batch option support in orc-contents and orc-scan tools.

          cc @omalley

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/xunzhang/orc orc-69

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/orc/pull/38.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #38


          commit 95a1b6e73c2e5414ac94225d87d67fc047a44185
          Author: xunzhang <xunzhangthu@gmail.com>
          Date: 2016-06-12T10:20:32Z

          ORC-69. Add batch option support in orc-contents and orc-scan tools.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user xunzhang opened a pull request: https://github.com/apache/orc/pull/38 ORC-69 . Add batch option support in orc-contents and orc-scan tools. cc @omalley You can merge this pull request into a Git repository by running: $ git pull https://github.com/xunzhang/orc orc-69 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/orc/pull/38.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #38 commit 95a1b6e73c2e5414ac94225d87d67fc047a44185 Author: xunzhang <xunzhangthu@gmail.com> Date: 2016-06-12T10:20:32Z ORC-69 . Add batch option support in orc-contents and orc-scan tools.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user xunzhang commented on the issue:

          https://github.com/apache/orc/pull/38

          I have tested the option `--batch` with `orc-contents` and `orc-scan` in my environment.

          Show
          githubbot ASF GitHub Bot added a comment - Github user xunzhang commented on the issue: https://github.com/apache/orc/pull/38 I have tested the option `--batch` with `orc-contents` and `orc-scan` in my environment.
          Hide
          owen.omalley Owen O'Malley added a comment -

          What is the goal? Neither of the tools will have any user-visible changes based on the parameter.

          Show
          owen.omalley Owen O'Malley added a comment - What is the goal? Neither of the tools will have any user-visible changes based on the parameter.
          Hide
          xunzhang hongwu added a comment -

          It is optional option at the end of the command which does not affect the existing commands. The original idea is that I see the option is existed in FileMemory and think it could also be useful in FileScan and FileContent tools. Maybe, user want to test some performance using these tools?

          Show
          xunzhang hongwu added a comment - It is optional option at the end of the command which does not affect the existing commands. The original idea is that I see the option is existed in FileMemory and think it could also be useful in FileScan and FileContent tools. Maybe, user want to test some performance using these tools?
          Hide
          owen.omalley Owen O'Malley added a comment -

          I'm closing this, since there isn't any user visible change in changing the added parameter.

          Show
          owen.omalley Owen O'Malley added a comment - I'm closing this, since there isn't any user visible change in changing the added parameter.
          Hide
          owen.omalley Owen O'Malley added a comment -

          Ok, I'm going to change my mind and let this go into the internal command orc-scan.

          Show
          owen.omalley Owen O'Malley added a comment - Ok, I'm going to change my mind and let this go into the internal command orc-scan.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user omalley commented on the issue:

          https://github.com/apache/orc/pull/38

          I buy that we may want to test the batch size for performance tests, but orc-scan is the important tool there since you don't really want to benchmark the conversion into JSON.

          With that in mind, I've made a variant of this patch that:

          • reverts the change to orc-contents
          • switches orc-scan to use getopt_long
          • adds some test infrastructure so that we can test the executables
          • adds tests of the nominal and off-nominal invocations of orc-scan.

          Please see my changes on https://github.com/omalley/orc/tree/orc-69

          Show
          githubbot ASF GitHub Bot added a comment - Github user omalley commented on the issue: https://github.com/apache/orc/pull/38 I buy that we may want to test the batch size for performance tests, but orc-scan is the important tool there since you don't really want to benchmark the conversion into JSON. With that in mind, I've made a variant of this patch that: reverts the change to orc-contents switches orc-scan to use getopt_long adds some test infrastructure so that we can test the executables adds tests of the nominal and off-nominal invocations of orc-scan. Please see my changes on https://github.com/omalley/orc/tree/orc-69
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user xunzhang commented on the issue:

          https://github.com/apache/orc/pull/38

          LGTM.
          I like the getopt solution. Other tools may also use this way in the future.
          And, feel free to commit without my name. Thanks!

          Show
          githubbot ASF GitHub Bot added a comment - Github user xunzhang commented on the issue: https://github.com/apache/orc/pull/38 LGTM. I like the getopt solution. Other tools may also use this way in the future. And, feel free to commit without my name. Thanks!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/orc/pull/38

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/orc/pull/38
          Hide
          owen.omalley Owen O'Malley added a comment -

          I just committed this. Thanks, hongwu!

          Show
          owen.omalley Owen O'Malley added a comment - I just committed this. Thanks, hongwu!
          Hide
          owen.omalley Owen O'Malley added a comment -

          Released as part of ORC 1.2.0

          Show
          owen.omalley Owen O'Malley added a comment - Released as part of ORC 1.2.0

            People

            • Assignee:
              xunzhang hongwu
              Reporter:
              xunzhang hongwu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development