Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14391

COPY FROM should read columns from file header

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Normal
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Legacy/CQL
    • Labels:
      None
    • Environment:

      cqlsh 5.0.1 and Cassandra 3.11.2 on macOS 10.13.2.

      Description

      COPY FROM appears to ignore the headers value, even when "headers = true" is specified. This means that if the columns are reordered, the import process will save values in the wrong columns.  Additionally, if there are missing columns, an error occurs, even if those columns are not primary key columns.

      This behavior contradicts the behavior specified in the docs (emphasis mine).

      COPY FROM imports data from a CSV file into an existing table. Each line in the source file is imported as a row. All rows in the dataset must contain the same number of fields and have values in the PRIMARY KEY fields. The process verifies the PRIMARY KEY and updates existing records. If HEADER = false and no column names are specified, the fields are imported in deterministic order. When column names are specified, fields are imported in that order. Missing and empty fields are set to null. The source cannot have more fields than the target table, however it can have fewer fields.

      Example

      temp.csv
      col2,col1,col3
      column value 1,key2,3
      column value 2,key4,3
      column value 3,key3,3
      column value 4,key1,3
      
      create keyspace copy_to_from_test WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
      use copy_to_from_test;
      create table test_table (col1 text primary key, col2 text, col3 bigint);
      copy test_table from 'temp.csv' with header = true;
      

      The above code will incorrectly swap the "col2" and "col1" values, since it expects the first column to be "col1". If I had instead swapped the order of "col3", I would have received an error on input, as it would have attempted to store text in a numerical column.

       Expected Behavior

      I would expect specifying "with header = true" on a COPY FROM statement to use the headers as column names for insertion, rather than merely skipping the header row.  Missing non-primary key columns should be set to null.

      Other

      I ran across this issue when copying between two of my environments. One of the environments had changed the columns in the primary key, but the other had not yet. This caused the order of the columns to vary between the environments.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mjjustin M. Justin
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: