Details
-
Improvement
-
Status: Open
-
Low
-
Resolution: Unresolved
-
None
-
Cassandra 2.1.14.1346
Description
Suppose a have a table with 3 columns
Then the data is copied to a delimited file with HEADER
cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 } AND durable_writes = 'true'; cqlsh> use my_keyspace ; cqlsh:my_keyspace> CREATE TABLE my_table ( col1 int PRIMARY KEY, col2 text, col3 text ); cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 1, '1st row') ; cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 2, '2nd row') ; cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 3, '3rd row') ; cqlsh:my_keyspace> COPY my_keyspace.my_table ( col1, col2 ) TO 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ; Reading options from the command line: {'header': 'true', 'delimiter': '|'} Using 3 child processes Starting copy of my_keyspace.my_table with columns ['col1', 'col2']. Processed: 3 rows; Rate: 10 rows/s; Avg. rate: 4 rows/s 3 rows exported to 1 files in 0.861 seconds.
This will create a file with these contents
col1|col2 3|3rd row 2|2nd row 1|1st row
Then we create another table with same DDL
cqlsh:my_keyspace> CREATE TABLE my_table2 ( col1 int PRIMARY KEY, col2 text, col3 text );
A restore from the recently created delimited file with header data file WILL FAIL because no columns were specified so it is expecting all columns to be in the delimited file - but we have a header row and the header option was specified
cqlsh:my_keyspace> COPY my_table2 FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ; Reading options from the command line: {'header': 'true', 'delimiter': '|'} Using 3 child processes Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2', 'col3']. Failed to import 3 rows: ParseError - Invalid row length 2 should be 3, given up without retries Failed to process 3 rows; failed rows written to import_my_keyspace_my_table2.err Processed: 3 rows; Rate: 5 rows/s; Avg. rate: 7 rows/s 3 rows imported from 1 files in 0.442 seconds (0 skipped).
Provided that HEADER = true, It would be very handy if CQLSH looks into the header row and retrieves the column names so they do not have to be entered manually on the copy command - especially where there is a significant number of columns
cqlsh:my_keyspace> COPY my_table2 (col1, col2) FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ; Reading options from the command line: {'header': 'true', 'delimiter': '|'} Using 3 child processes Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2']. Processed: 3 rows; Rate: 3 rows/s; Avg. rate: 4 rows/s 3 rows imported from 1 files in 0.708 seconds (0 skipped). cqlsh:my_keyspace> select * from my_table2; col1 | col2 | col3 ------+---------+------ 1 | 1st row | null 2 | 2nd row | null 3 | 3rd row | null (3 rows) cqlsh:my_keyspace>