Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-12079

CQLSH to retrieve column names from data file header

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Low
    • Resolution: Unresolved
    • 2.2.x, 3.0.x, 3.11.x
    • Legacy/Core
    • None
    • Cassandra 2.1.14.1346

    Description

      Suppose a have a table with 3 columns
      Then the data is copied to a delimited file with HEADER

      cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 } AND durable_writes = 'true';
      cqlsh> use my_keyspace ;
      cqlsh:my_keyspace> CREATE TABLE my_table ( col1 int PRIMARY KEY, col2 text, col3 text );
      cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 1, '1st row') ;
      cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 2, '2nd row') ;
      cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 3, '3rd row') ;
      cqlsh:my_keyspace> COPY my_keyspace.my_table ( col1, col2 ) TO 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
      Reading options from the command line: {'header': 'true', 'delimiter': '|'}
      Using 3 child processes
      
      Starting copy of my_keyspace.my_table with columns ['col1', 'col2'].
      Processed: 3 rows; Rate:      10 rows/s; Avg. rate:       4 rows/s
      3 rows exported to 1 files in 0.861 seconds.
      

      This will create a file with these contents

      col1|col2
      3|3rd row
      2|2nd row
      1|1st row
      

      Then we create another table with same DDL

      cqlsh:my_keyspace> CREATE TABLE my_table2 ( col1 int PRIMARY KEY, col2 text, col3 text );
      

      A restore from the recently created delimited file with header data file WILL FAIL because no columns were specified so it is expecting all columns to be in the delimited file - but we have a header row and the header option was specified

      cqlsh:my_keyspace> COPY my_table2 FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
      Reading options from the command line: {'header': 'true', 'delimiter': '|'}
      Using 3 child processes
      
      Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2', 'col3'].
      Failed to import 3 rows: ParseError - Invalid row length 2 should be 3,  given up without retries
      Failed to process 3 rows; failed rows written to import_my_keyspace_my_table2.err
      Processed: 3 rows; Rate:       5 rows/s; Avg. rate:       7 rows/s
      3 rows imported from 1 files in 0.442 seconds (0 skipped).
      

      Provided that HEADER = true, It would be very handy if CQLSH looks into the header row and retrieves the column names so they do not have to be entered manually on the copy command - especially where there is a significant number of columns

      cqlsh:my_keyspace> COPY my_table2 (col1, col2) FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
      Reading options from the command line: {'header': 'true', 'delimiter': '|'}
      Using 3 child processes
      
      Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2'].
      Processed: 3 rows; Rate:       3 rows/s; Avg. rate:       4 rows/s
      3 rows imported from 1 files in 0.708 seconds (0 skipped).
      cqlsh:my_keyspace> select * from my_table2;
      
       col1 | col2    | col3
      ------+---------+------
          1 | 1st row | null
          2 | 2nd row | null
          3 | 3rd row | null
      
      (3 rows)
      cqlsh:my_keyspace> 
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            jpoblete Jose Martinez Poblete
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: