Hive
  1. Hive
  2. HIVE-7390

Make single quote character optional and configurable in BeeLine CSV/TSV output

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.1
    • Fix Version/s: 0.14.0
    • Component/s: Clients
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change
    • Release Note:
      Hide
      --outputformat=[table/vertical/csv/tsv/dsv]
      Format mode for result display. Default is table.
      Usage: beeline --outputformat=tsv

      --delimiterForDSV=DELIMITER
      specify the delimiter for delimiter-separated values output format (default: |)
      Usage: beeline --outputformat=dsv --delimiterForDSV=,

      beeline dsv and delimiterForDSV examples are as followings:
      % bin/beeline
      Hive version 0.11.0-SNAPSHOT by Apache
      beeline> !connect jdbc:hive2://localhost:10000 scott tiger org.apache.hive.jdbc.HiveDriver
      !connect jdbc:hive2://localhost:10000 scott tiger org.apache.hive.jdbc.HiveDriver
      Connecting to jdbc:hive2://localhost:10000
      Connected to: Hive (version 0.14.0-SNAPSHOT)
      Driver: Hive (version 0.14.0-SNAPSHOT)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      HiveServer2 Clients – dsv Example
      0: jdbc:hive2://localhost:10000> create table csv_table(id int, name string, info string) row format delimited fields terminated by '\t';
      No rows affected (0.121 seconds)
      0: jdbc:hive2://localhost:10000> load data local inpath '/root/names' overwrite into table csv_table;
      No rows affected (0.245 seconds)
      0: jdbc:hive2://localhost:10000> select * from csv_table;
      +---------------+-----------------+-----------------+--+
      | csv_table.id | csv_table.name | csv_table.info |
      +---------------+-----------------+-----------------+--+
      | 19630001 | "john" | lennon |
      | 19630002 | peter,paul | mccartney |
      | 19630003 | george | harrison |
      | 19630004 | ringo | starr |
      +---------------+-----------------+-----------------+--+
      4 rows selected (0.09 seconds)
      0: jdbc:hive2://localhost:10000> !outformat csv
      Unknown command: outformat csv
      0: jdbc:hive2://localhost:10000> !outputformat csv
      0: jdbc:hive2://localhost:10000> select * from csv_table;
      csv_table.id,csv_table.name,csv_table.info
      19630001,"""john""",lennon
      19630002,"peter,paul",mccartney
      19630003,george,harrison
      19630004,ringo,starr
      4 rows selected (0.105 seconds)
      0: jdbc:hive2://localhost:10000> !outputformat dsv
      0: jdbc:hive2://localhost:10000> select * from csv_table;
      csv_table.id|csv_table.name|csv_table.info
      19630001|"""john"""|lennon
      19630002|peter,paul|mccartney
      19630003|george|harrison
      19630004|ringo|starr
      4 rows selected (0.123 seconds)
      0: jdbc:hive2://localhost:10000> !set delimiterForDSV ',';
      0: jdbc:hive2://localhost:10000> select * from csv_table;
      csv_table.id'csv_table.name'csv_table.info
      19630001'"""john"""'lennon
      19630002'peter,paul'mccartney
      19630003'george'harrison
      19630004'ringo'starr
      4 rows selected (0.11 seconds)
      Show
      --outputformat=[table/vertical/csv/tsv/dsv] Format mode for result display. Default is table. Usage: beeline --outputformat=tsv --delimiterForDSV=DELIMITER specify the delimiter for delimiter-separated values output format (default: |) Usage: beeline --outputformat=dsv --delimiterForDSV=, beeline dsv and delimiterForDSV examples are as followings: % bin/beeline Hive version 0.11.0-SNAPSHOT by Apache beeline> !connect jdbc: hive2://localhost:10000 scott tiger org.apache.hive.jdbc.HiveDriver !connect jdbc: hive2://localhost:10000 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc: hive2://localhost:10000 Connected to: Hive (version 0.14.0-SNAPSHOT) Driver: Hive (version 0.14.0-SNAPSHOT) Transaction isolation: TRANSACTION_REPEATABLE_READ HiveServer2 Clients – dsv Example 0: jdbc: hive2://localhost:10000 > create table csv_table(id int, name string, info string) row format delimited fields terminated by '\t'; No rows affected (0.121 seconds) 0: jdbc: hive2://localhost:10000 > load data local inpath '/root/names' overwrite into table csv_table; No rows affected (0.245 seconds) 0: jdbc: hive2://localhost:10000 > select * from csv_table; +---------------+-----------------+-----------------+--+ | csv_table.id | csv_table.name | csv_table.info | +---------------+-----------------+-----------------+--+ | 19630001 | "john" | lennon | | 19630002 | peter,paul | mccartney | | 19630003 | george | harrison | | 19630004 | ringo | starr | +---------------+-----------------+-----------------+--+ 4 rows selected (0.09 seconds) 0: jdbc: hive2://localhost:10000 > !outformat csv Unknown command: outformat csv 0: jdbc: hive2://localhost:10000 > !outputformat csv 0: jdbc: hive2://localhost:10000 > select * from csv_table; csv_table.id,csv_table.name,csv_table.info 19630001,"""john""",lennon 19630002,"peter,paul",mccartney 19630003,george,harrison 19630004,ringo,starr 4 rows selected (0.105 seconds) 0: jdbc: hive2://localhost:10000 > !outputformat dsv 0: jdbc: hive2://localhost:10000 > select * from csv_table; csv_table.id|csv_table.name|csv_table.info 19630001|"""john"""|lennon 19630002|peter,paul|mccartney 19630003|george|harrison 19630004|ringo|starr 4 rows selected (0.123 seconds) 0: jdbc: hive2://localhost:10000 > !set delimiterForDSV ','; 0: jdbc: hive2://localhost:10000 > select * from csv_table; csv_table.id'csv_table.name'csv_table.info 19630001'"""john"""'lennon 19630002'peter,paul'mccartney 19630003'george'harrison 19630004'ringo'starr 4 rows selected (0.11 seconds)

      Description

      Currently when either the CSV or TSV output formats are used in beeline each column is wrapped in single quotes. Quote wrapping of columns should be optional and the user should be able to choose the character used to wrap the columns.

      1. HIVE-7390.patch
        4 kB
        Jim Halfpenny
      2. HIVE-7390.9.patch
        9 kB
        Ferdinand Xu
      3. HIVE-7390.8.patch
        9 kB
        Ferdinand Xu
      4. HIVE-7390.7.patch
        9 kB
        Ferdinand Xu
      5. HIVE-7390.6.patch
        9 kB
        Ferdinand Xu
      6. HIVE-7390.5.patch
        9 kB
        Ferdinand Xu
      7. HIVE-7390.4.patch
        8 kB
        Ferdinand Xu
      8. HIVE-7390.3.patch
        7 kB
        ferdinand xu
      9. HIVE-7390.2.patch
        7 kB
        ferdinand xu
      10. HIVE-7390.1.patch
        7 kB
        ferdinand xu

        Issue Links

          Activity

          Hide
          Jim Halfpenny added a comment -

          I've attached a patch that adds the option --wrapColumns and --wrapCharacters. This allows the user to disable the quoting of columns and to specify a different quote character.

          Show
          Jim Halfpenny added a comment - I've attached a patch that adds the option --wrapColumns and --wrapCharacters. This allows the user to disable the quoting of columns and to specify a different quote character.
          Hide
          Xuefu Zhang added a comment -

          If we are talking about CSV/TSV, then these options should not be universal. Nevertheless, I'd like to understand what quotes are specified by CSV/TSV standard and whether Beeline is conforming to that standard. If we are proposing new kind of output format, the right way seems to be providing alternatives to !outputformat command, parallel to CSV/TSV.

          Show
          Xuefu Zhang added a comment - If we are talking about CSV/TSV, then these options should not be universal. Nevertheless, I'd like to understand what quotes are specified by CSV/TSV standard and whether Beeline is conforming to that standard. If we are proposing new kind of output format, the right way seems to be providing alternatives to !outputformat command, parallel to CSV/TSV.
          Hide
          Jim Halfpenny added a comment -

          The definition of the CSV/TSV formats are poorly defined in terms of standards. There is RFC 4180 which defines CSV files as having an optional double quote character around the values. Currently beeline uses a single quote, which is contrary to the RFC. Nor does beeline escape any instances of the quote character in the content of the fields.

          One option would be to provide additional output formats for unquoted CSV and TSV so that users can decide whether or not then output fields should be wrapped.

          Show
          Jim Halfpenny added a comment - The definition of the CSV/TSV formats are poorly defined in terms of standards. There is RFC 4180 which defines CSV files as having an optional double quote character around the values. Currently beeline uses a single quote, which is contrary to the RFC. Nor does beeline escape any instances of the quote character in the content of the fields. One option would be to provide additional output formats for unquoted CSV and TSV so that users can decide whether or not then output fields should be wrapped.
          Hide
          ferdinand xu added a comment -

          Use RFC format for csv mode and add one more option to reserve the previous hive cli format which is not enclosed by quote.
          And RB entry is created in https://reviews.apache.org/r/23799/

          Show
          ferdinand xu added a comment - Use RFC format for csv mode and add one more option to reserve the previous hive cli format which is not enclosed by quote. And RB entry is created in https://reviews.apache.org/r/23799/
          Hide
          ferdinand xu added a comment -

          (1) fix format issue for pom file
          (2) do not use IOUtil from zookeeper in beeline codes

          Show
          ferdinand xu added a comment - (1) fix format issue for pom file (2) do not use IOUtil from zookeeper in beeline codes
          Hide
          Szehon Ho added a comment -

          Hi, I guess we will continue the discussion here from HIVE-7390. First, thanks for incorporating my feedback from that partial patch.

          So I read this patch, and it has one option:

           'outputAsCLICSVFormat=[true/false]	display the output in the csv format as Hive command line\n \ 

          The only difference is quote, right? If my understanding is right, can't we have a new output format called 'quotedCSV', and change the default csv format to be unquoted, as was discussed earlier on this JIRA by Jim Halfpenny. (He mentioned single-quote is not standard). Some disadvantages of 'outputAsCliCSVFormat' option is that its name is not very descriptive for users, and as Xuefu mentioned it is universal option even though it should only apply to CSV. What do you think?

          Show
          Szehon Ho added a comment - Hi, I guess we will continue the discussion here from HIVE-7390 . First, thanks for incorporating my feedback from that partial patch. So I read this patch, and it has one option: 'outputAsCLICSVFormat=[true/false] display the output in the csv format as Hive command line\n \ The only difference is quote, right? If my understanding is right, can't we have a new output format called 'quotedCSV', and change the default csv format to be unquoted, as was discussed earlier on this JIRA by Jim Halfpenny. (He mentioned single-quote is not standard). Some disadvantages of 'outputAsCliCSVFormat' option is that its name is not very descriptive for users, and as Xuefu mentioned it is universal option even though it should only apply to CSV. What do you think?
          Hide
          Ferdinand Xu added a comment -

          Adding the new format type called 'quotedCSV' makes sense for me.Let's make it to add the quotedCSV format for backward comparative support instead of providing an extra beeline option for csv format.

          Show
          Ferdinand Xu added a comment - Adding the new format type called 'quotedCSV' makes sense for me.Let's make it to add the quotedCSV format for backward comparative support instead of providing an extra beeline option for csv format.
          Hide
          Ferdinand Xu added a comment -

          code changes according to the discussion

          Show
          Ferdinand Xu added a comment - code changes according to the discussion
          Hide
          Lars Francke added a comment -

          As noted in my review I'm not too sure about adding another format especially if it's called "quotedCSV" because that implies that the others aren't using quoting but they actually are when needed.

          The old way sometimes produces invalid CSV (when quoting or delimiter chars exist in the data) so I think it's a good idea to fix this (and super-csv seems to solve that). I'm not sure if preserving the old functionality is worth anything. And if you do then maybe deprecate it and name it `deprecatedCSV` or something like that.

          I'd be in favor of two options instead (similar to what was suggested originally)

          • Delimiter
          • Quoting character

          Maybe even a third: Quoting mode. I'm in favor of always adding quotes as it makes parsing easier (no need to check for quoted/unquoted columns etc.). If not adding that I'd vote in favor of changing the current quoting mode to the AllwaysQuote mode.

          Show
          Lars Francke added a comment - As noted in my review I'm not too sure about adding another format especially if it's called "quotedCSV" because that implies that the others aren't using quoting but they actually are when needed. The old way sometimes produces invalid CSV (when quoting or delimiter chars exist in the data) so I think it's a good idea to fix this (and super-csv seems to solve that). I'm not sure if preserving the old functionality is worth anything. And if you do then maybe deprecate it and name it `deprecatedCSV` or something like that. I'd be in favor of two options instead (similar to what was suggested originally) Delimiter Quoting character Maybe even a third: Quoting mode. I'm in favor of always adding quotes as it makes parsing easier (no need to check for quoted/unquoted columns etc.). If not adding that I'd vote in favor of changing the current quoting mode to the AllwaysQuote mode.
          Hide
          Szehon Ho added a comment -

          Thanks for the details, I was just reading the earlier comments and wrongly assumed that the two valid CSV options ones are double-quotes, and no quotes at all. You're right that normal quote mode still means quotes sometimes, so my proposed naming didnt make sense, sorry about that Ferdinand.

          So we should:

          1. Fix the current CSV to conform by using super-csv (like the patch I originally looked at in HIVE-7434). No debate on that.
          2. See what CSV options (if any) we are going to expose

          I'd still try to keep it simple if possible. Can we expose quote mode only? (always, normal). Im not sure if delimiter, quote character would add that much value, but I'm not heavy CSV user. Thoughts?

          Show
          Szehon Ho added a comment - Thanks for the details, I was just reading the earlier comments and wrongly assumed that the two valid CSV options ones are double-quotes, and no quotes at all. You're right that normal quote mode still means quotes sometimes, so my proposed naming didnt make sense, sorry about that Ferdinand. So we should: Fix the current CSV to conform by using super-csv (like the patch I originally looked at in HIVE-7434 ). No debate on that. See what CSV options (if any) we are going to expose I'd still try to keep it simple if possible. Can we expose quote mode only? (always, normal). Im not sure if delimiter, quote character would add that much value, but I'm not heavy CSV user. Thoughts?
          Hide
          Lars Francke added a comment -

          You summed it up nicely, thanks.

          The original intention of this issue was to make the quote character optional and configurable so Jim must have had a use-case for that. I can't think of a good one atm.

          I can however think of a good reason for a configurable delimiter. Comma, semicolon or tab occur relatively frequently in data but some other character (\001 or "|") might not occur in the data and being able to pick this as the delimiter allows to make parsing way simpler (just split on delimiter instead of looking for quoted strings etc.). This is especially interesting when you then want to mount another table on that data in Hive or post-process in any other simple way where you don't have access to a full fledged CSV parsing library.

          So: Picking the delimiter is often very helpful in avoiding a whole class of parsing issues and allows to just split on the delimiter.

          I think that we can easily catch most common issues with two changes:

          1. Fix current CSV and TSV. As you say: No debate on that
          2. Allow delimiter to be specified and keep "normal quoting" mode

          That allows everyone who really understands his data to avoid quoting and everyone else can get properly formatted CSVs for a full CSV parser. In the same vein I think that surroundingSpacesNeedQuotes should stay disabled.

          But as I said: This is kinda hijacking Jim's original issue...

          Show
          Lars Francke added a comment - You summed it up nicely, thanks. The original intention of this issue was to make the quote character optional and configurable so Jim must have had a use-case for that. I can't think of a good one atm. I can however think of a good reason for a configurable delimiter. Comma, semicolon or tab occur relatively frequently in data but some other character (\001 or "|") might not occur in the data and being able to pick this as the delimiter allows to make parsing way simpler (just split on delimiter instead of looking for quoted strings etc.). This is especially interesting when you then want to mount another table on that data in Hive or post-process in any other simple way where you don't have access to a full fledged CSV parsing library. So: Picking the delimiter is often very helpful in avoiding a whole class of parsing issues and allows to just split on the delimiter. I think that we can easily catch most common issues with two changes: 1. Fix current CSV and TSV. As you say: No debate on that 2. Allow delimiter to be specified and keep "normal quoting" mode That allows everyone who really understands his data to avoid quoting and everyone else can get properly formatted CSVs for a full CSV parser. In the same vein I think that surroundingSpacesNeedQuotes should stay disabled. But as I said: This is kinda hijacking Jim's original issue...
          Hide
          Ferdinand Xu added a comment -

          Thanks for Lars Francke and Szehon Ho about your comments. For current CSV and TSV, just make it work in the right way(quoted at the correct time) and for customized delimiter support, I think we can add a new output format called DSV(short for Delimiter-separated values) and one beeline option to specify the delimiter for user.

          Show
          Ferdinand Xu added a comment - Thanks for Lars Francke and Szehon Ho about your comments. For current CSV and TSV, just make it work in the right way(quoted at the correct time) and for customized delimiter support, I think we can add a new output format called DSV(short for Delimiter-separated values) and one beeline option to specify the delimiter for user.
          Hide
          Lars Francke added a comment -

          Thank you Ferdinand! I added a few more comments but those are relatively minor and have nothing to do with the functionality itself. I think adding this one new format + option is a good idea.

          Show
          Lars Francke added a comment - Thank you Ferdinand! I added a few more comments but those are relatively minor and have nothing to do with the functionality itself. I think adding this one new format + option is a good idea.
          Hide
          Ferdinand Xu added a comment -

          Refine the code according to Lars's comments. Really appreciate Lars for your review.

          Show
          Ferdinand Xu added a comment - Refine the code according to Lars's comments. Really appreciate Lars for your review.
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12659175/HIVE-7390.7.patch

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/141/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/141/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-141/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests exited with: NonZeroExitCodeException
          Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
          + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
          + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
          + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
          + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
          + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-141/source-prep.txt
          + [[ false == \t\r\u\e ]]
          + mkdir -p maven ivy
          + [[ svn = \s\v\n ]]
          + [[ -n '' ]]
          + [[ -d apache-svn-trunk-source ]]
          + [[ ! -d apache-svn-trunk-source/.svn ]]
          + [[ ! -d apache-svn-trunk-source ]]
          + cd apache-svn-trunk-source
          + svn revert -R .
          Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java'
          Reverted 'cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java'
          Reverted 'ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java'
          ++ egrep -v '^X|^Performing status on external'
          ++ awk '{print $2}'
          ++ svn status --no-ignore
          + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/java/org/apache/hadoop/hive/ql/processors/ListResourceProcessor.java
          + svn update
          
          Fetching external item into 'hcatalog/src/test/e2e/harness'
          External at revision 1615277.
          
          At revision 1615277.
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0, p1, or p2
          + exit 1
          '
          

          This message is automatically generated.

          ATTACHMENT ID: 12659175

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12659175/HIVE-7390.7.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/141/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/141/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-141/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-141/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java' Reverted 'cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/java/org/apache/hadoop/hive/ql/processors/ListResourceProcessor.java + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1615277. At revision 1615277. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12659175
          Hide
          Ferdinand Xu added a comment -

          rebase the code and make hive-qa happy

          Show
          Ferdinand Xu added a comment - rebase the code and make hive-qa happy
          Hide
          Szehon Ho added a comment -

          Thanks Ferdinand for the work, just one more minor comment on the RB. (sorry for late reply)

          Show
          Szehon Ho added a comment - Thanks Ferdinand for the work, just one more minor comment on the RB. (sorry for late reply)
          Hide
          Szehon Ho added a comment -

          Ferdinand Xu can you please upload the latest patch on this JIRA ?

          Show
          Szehon Ho added a comment - Ferdinand Xu can you please upload the latest patch on this JIRA ?
          Hide
          Ferdinand Xu added a comment -

          Latest version for Lars comments

          Show
          Ferdinand Xu added a comment - Latest version for Lars comments
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12660905/HIVE-7390.9.patch

          ERROR: -1 due to 1 failed/errored test(s), 5873 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/248/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/248/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-248/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12660905

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660905/HIVE-7390.9.patch ERROR: -1 due to 1 failed/errored test(s), 5873 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/248/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/248/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-248/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12660905
          Hide
          Szehon Ho added a comment -

          +1

          Show
          Szehon Ho added a comment - +1
          Hide
          Szehon Ho added a comment -

          Committed to trunk. Thanks Ferdinand Xu for the contribution!

          Show
          Szehon Ho added a comment - Committed to trunk. Thanks Ferdinand Xu for the contribution!
          Hide
          Lefty Leverenz added a comment -

          This needs to be documented in the wiki before 0.14.0 is released. A release note would also be helpful.

          General information about quotes being optional in 0.14.0+ could go in a version box after the Beeline example, and --delimiterForDSV belongs in Beeline Command Options along with the new value for --outputformat.

          Show
          Lefty Leverenz added a comment - This needs to be documented in the wiki before 0.14.0 is released. A release note would also be helpful. General information about quotes being optional in 0.14.0+ could go in a version box after the Beeline example, and --delimiterForDSV belongs in Beeline Command Options along with the new value for --outputformat . HiveServer2 Clients – Beeline Example HiveServer2 Clients – Beeline Command Options
          Hide
          Vaibhav Gumashta added a comment -

          This seems like a backward incompatible change as it breaks old client behavior. I've created HIVE-8544 for addressing the issue.

          Show
          Vaibhav Gumashta added a comment - This seems like a backward incompatible change as it breaks old client behavior. I've created HIVE-8544 for addressing the issue.
          Hide
          Thejas M Nair added a comment -

          HIVE-8544 has changes to address change in delimiter character .
          What are the other differences in behavior this introduces ? Is it the escaping of quote char in data (by using two quotes) ? When is the quoting used with CSV? How has it changed the behavior or table output format ?

          We should carefully document any incompatible changes because of the fix. Looks like there are changes (apart from one described in HIVE-8544) in this patch that can break users processing of the output .

          Show
          Thejas M Nair added a comment - HIVE-8544 has changes to address change in delimiter character . What are the other differences in behavior this introduces ? Is it the escaping of quote char in data (by using two quotes) ? When is the quoting used with CSV? How has it changed the behavior or table output format ? We should carefully document any incompatible changes because of the fix. Looks like there are changes (apart from one described in HIVE-8544 ) in this patch that can break users processing of the output .
          Hide
          Thejas M Nair added a comment -

          Any help in documenting the behavior changes from people who closely worked on this patch is appreciated ! We can document release note section for now (click on edit to change the release note section).

          Show
          Thejas M Nair added a comment - Any help in documenting the behavior changes from people who closely worked on this patch is appreciated ! We can document release note section for now (click on edit to change the release note section).
          Hide
          Szehon Ho added a comment -

          Hi, it seems the consensus discussion at the time was that old CSV format was wrong (always added an extra quote around values no matter what). And that it's valuable to add one extra option for configuring delimiting character in these *sv's. So to summarize:

          • Add a new Beeline output format: DSV , and a new Beeline option delimiterForDSV to decide what this would use to delimit. For example if your char is | (default), then your data would look like "a|b|c".
          • For all the *SV's, use double-quote as quoteChar, ',' as separator.

          However, it seems it will not quote by default as it did before this change. If you read the CSV specs, it will quote in these conditions: when a cell contains special characters, such as the delimiter char, a quote char, or spans multiple lines.

          So whereas CSV would have given "'a','b','c'" in the past, it will now not give those and always give "a,b,c". Hope that helps.

          Show
          Szehon Ho added a comment - Hi, it seems the consensus discussion at the time was that old CSV format was wrong (always added an extra quote around values no matter what). And that it's valuable to add one extra option for configuring delimiting character in these *sv's. So to summarize: Add a new Beeline output format: DSV , and a new Beeline option delimiterForDSV to decide what this would use to delimit. For example if your char is | (default), then your data would look like "a|b|c". For all the *SV's, use double-quote as quoteChar, ',' as separator. However, it seems it will not quote by default as it did before this change. If you read the CSV specs, it will quote in these conditions: when a cell contains special characters, such as the delimiter char, a quote char, or spans multiple lines. So whereas CSV would have given "'a','b','c'" in the past, it will now not give those and always give "a,b,c". Hope that helps.
          Hide
          Szehon Ho added a comment -

          I made a typo in point two, the separator is comma for CSV only. For DSV it's configurable as explained in point one, but defaults to |. For TSV it is a tab character.
          CSV: a,b,c
          DSV: a|b|c (can configure this)
          TSV: a\tb\tc

          Addendum: to give another example of when it does quote, the code seems to indicate it will use double-quote, so:
          CSV: "a","b","c"
          DSV: "a"|"b"|"c"
          TSV: ...
          but only if these values

          {a,b,c}

          fit into the critiera as explained.

          Show
          Szehon Ho added a comment - I made a typo in point two, the separator is comma for CSV only. For DSV it's configurable as explained in point one, but defaults to |. For TSV it is a tab character. CSV: a,b,c DSV: a|b|c (can configure this) TSV: a\tb\tc Addendum: to give another example of when it does quote, the code seems to indicate it will use double-quote, so: CSV: "a","b","c" DSV: "a"|"b"|"c" TSV: ... but only if these values {a,b,c} fit into the critiera as explained.
          Hide
          Ferdinand Xu added a comment -

          Hi Thejas M Nair and Lefty Leverenz,
          Release notes are added. Sorry for being late.

          Show
          Ferdinand Xu added a comment - Hi Thejas M Nair and Lefty Leverenz , Release notes are added. Sorry for being late.
          Hide
          Thejas M Nair added a comment -

          Created HIVE-8615 so that existing user applications don't break because of the format change incompatibility issues, and make it easier for users to upgrade to 0.14 .

          Show
          Thejas M Nair added a comment - Created HIVE-8615 so that existing user applications don't break because of the format change incompatibility issues, and make it easier for users to upgrade to 0.14 .
          Hide
          Szehon Ho added a comment -

          Doc covered in HIVE-8615

          Show
          Szehon Ho added a comment - Doc covered in HIVE-8615
          Hide
          Thejas M Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          Show
          Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.
          Hide
          Brock Noland added a comment -

          FYI this jira makes the single quote character optional but not the double quote. e.g.:

          source data:

          beeline -u jdbc:hive2://localhost:10000 -e "select * from quote_test" 2>&1 | grep -Ev '^SLF4J'
          +----------------+----------------+--+
          | quote_test.c1  | quote_test.c2  |
          +----------------+----------------+--+
          | "A"               | B              |
          | ""C""            | "D"            |
          +----------------+----------------+--+
          

          csv:

          beeline -u jdbc:hive2://localhost:10000 --outputformat=csv -e "select * from quote_test" 2>&1 | grep -Ev '^SLF4J'
          'quote_test.c1','quote_test.c2'
          '"A"','B'
          '""C""','"D"'
          

          csv2:

          beeline -u jdbc:hive2://localhost:10000 --outputformat=csv2 -e "select * from quote_test" 2>&1 | grep -Ev '^SLF4J'
          quote_test.c1,quote_test.c2
          """A""",B
          """""C""""","""D"""
          
          Show
          Brock Noland added a comment - FYI this jira makes the single quote character optional but not the double quote. e.g.: source data: beeline -u jdbc:hive2://localhost:10000 -e "select * from quote_test" 2>&1 | grep -Ev '^SLF4J' +----------------+----------------+--+ | quote_test.c1 | quote_test.c2 | +----------------+----------------+--+ | "A" | B | | ""C"" | "D" | +----------------+----------------+--+ csv: beeline -u jdbc:hive2://localhost:10000 --outputformat=csv -e "select * from quote_test" 2>&1 | grep -Ev '^SLF4J' 'quote_test.c1','quote_test.c2' '"A"','B' '""C""','"D"' csv2: beeline -u jdbc:hive2://localhost:10000 --outputformat=csv2 -e "select * from quote_test" 2>&1 | grep -Ev '^SLF4J' quote_test.c1,quote_test.c2 """A""",B """""C""""","""D"""

            People

            • Assignee:
              Ferdinand Xu
              Reporter:
              Jim Halfpenny
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development