[CASSANDRA-12079] CQLSH to retrieve column names from data file header - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Low
Resolution: Unresolved
Fix Version/s: 2.2.x, 3.0.x, 3.11.x
Component/s: Legacy/Core
Labels:
None
Environment:

Cassandra 2.1.14.1346

Description

Suppose a have a table with 3 columns
Then the data is copied to a delimited file with HEADER

cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 } AND durable_writes = 'true';
cqlsh> use my_keyspace ;
cqlsh:my_keyspace> CREATE TABLE my_table ( col1 int PRIMARY KEY, col2 text, col3 text );
cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 1, '1st row') ;
cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 2, '2nd row') ;
cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 3, '3rd row') ;
cqlsh:my_keyspace> COPY my_keyspace.my_table ( col1, col2 ) TO 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
Reading options from the command line: {'header': 'true', 'delimiter': '|'}
Using 3 child processes

Starting copy of my_keyspace.my_table with columns ['col1', 'col2'].
Processed: 3 rows; Rate:      10 rows/s; Avg. rate:       4 rows/s
3 rows exported to 1 files in 0.861 seconds.

This will create a file with these contents

col1|col2
3|3rd row
2|2nd row
1|1st row

Then we create another table with same DDL

cqlsh:my_keyspace> CREATE TABLE my_table2 ( col1 int PRIMARY KEY, col2 text, col3 text );

A restore from the recently created delimited file with header data file WILL FAIL because no columns were specified so it is expecting all columns to be in the delimited file - but we have a header row and the header option was specified

cqlsh:my_keyspace> COPY my_table2 FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
Reading options from the command line: {'header': 'true', 'delimiter': '|'}
Using 3 child processes

Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2', 'col3'].
Failed to import 3 rows: ParseError - Invalid row length 2 should be 3,  given up without retries
Failed to process 3 rows; failed rows written to import_my_keyspace_my_table2.err
Processed: 3 rows; Rate:       5 rows/s; Avg. rate:       7 rows/s
3 rows imported from 1 files in 0.442 seconds (0 skipped).

Provided that HEADER = true, It would be very handy if CQLSH looks into the header row and retrieves the column names so they do not have to be entered manually on the copy command - especially where there is a significant number of columns

cqlsh:my_keyspace> COPY my_table2 (col1, col2) FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
Reading options from the command line: {'header': 'true', 'delimiter': '|'}
Using 3 child processes

Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2'].
Processed: 3 rows; Rate:       3 rows/s; Avg. rate:       4 rows/s
3 rows imported from 1 files in 0.708 seconds (0 skipped).
cqlsh:my_keyspace> select * from my_table2;

 col1 | col2    | col3
------+---------+------
    1 | 1st row | null
    2 | 2nd row | null
    3 | 3rd row | null

(3 rows)
cqlsh:my_keyspace>

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Jose Martinez Poblete

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/Jun/16 15:46

Updated:: 16/Apr/19 09:30