Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9074

Hadoop Cassandra CqlInputFormat pagination - not reading all input rows

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Low
    • Resolution: Duplicate
    • Fix Version/s: 2.0.15
    • Component/s: None
    • Labels:
      None
    • Environment:

      Cassandra 2.0.11, Hadoop 1.0.4, Datastax java cassandra-driver-core 2.1.4

    • Severity:
      Low

      Description

      I have a 7-node Cassandra (v2.0.11) cluster and a table with 10k rows. I run a hadoop job (datanodes reside on cassandra nodes of course) that reads data from that table and I see that only 7k rows is read to map phase.

      I checked CqlInputFormat source code and noticed that a CQL query is build to select node-local date and also LIMIT clause is added (1k default). So that 7k read rows can be explained:
      7 nodes * 1k limit = 7k rows read total

      The limit can be changed using CqlConfigHelper:

      CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), "1000");

      Please help me with questions below:
      Is this a desired behavior?
      Why CqlInputFormat does not page through the rest of rows?
      Is it a bug or should I just increase the InputCQLPageRowSize value?
      What if I want to read all data in table and do not know the row count?
      What if the amount of rows I need to read per cassandra node is very large - in other words how to avoid OOM when setting InputCQLPageRowSize very large to handle all data?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                alexliu68 Alex Liu
                Reporter:
                fuggy_yama fuggy_yama
                Authors:
                Alex Liu
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: