Cassandra
  1. Cassandra
  2. CASSANDRA-5649

Move resultset type information into prepare, not execute

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 2.0 beta 1
    • Component/s: None
    • Labels:
      None

      Description

      Native protocol 1.0 sends type information on execute. This is a minor inefficiency for large resultsets; unfortunately, single-row resultsets are common.

      This does represent a performance regression from Thrift; Thrift does not send type information at all. (Bad for driver complexity, but good for performance.)

        Activity

        Jonathan Ellis created issue -
        Hide
        Jonathan Ellis added a comment -

        Comparing prior art shows that both MySQL and PostgreSQL send type information for each execution, but I cannot find a reason for this other than historical accident. A postgresql developer confirmed that a prepared statement must return the same types with each call, so theoretically there should be no obstacle in only returning types once.

        Show
        Jonathan Ellis added a comment - Comparing prior art shows that both MySQL and PostgreSQL send type information for each execution, but I cannot find a reason for this other than historical accident. A postgresql developer confirmed that a prepared statement must return the same types with each call, so theoretically there should be no obstacle in only returning types once.
        Hide
        Sylvain Lebresne added a comment -

        I have some doubts that this would provide a noticeable benefit in practice. The type information in the result set is fairly compact (though it's true we could save the full metadata in practice). I'm not sure reading the message is much of a bottleneck in practice for small messages (and for big ones, the metadata size is negligeable anyway). And there is compression too.

        On the other side, this does complexify client drivers. Currently, you can fully decode a result message without any external information. This is a nice property implementation wise, and is somewhat safer. And I'm not sure requiring too much state from the client driver to do basic things is ideal.

        I can be wrong, but my intuition is that neither MySQL nor PostgreSQL does that because they don't consider it worth the complexity. And that's my intuition too.

        So I'm fine doing some benchmarking to see if this can make a measurable difference in practice, but I'm -1 on going ahead with this without concrete evidence of the benefits since there is known drawbacks. And I kind of feel it's too late for 2.0.

        Show
        Sylvain Lebresne added a comment - I have some doubts that this would provide a noticeable benefit in practice. The type information in the result set is fairly compact (though it's true we could save the full metadata in practice). I'm not sure reading the message is much of a bottleneck in practice for small messages (and for big ones, the metadata size is negligeable anyway). And there is compression too. On the other side, this does complexify client drivers. Currently, you can fully decode a result message without any external information. This is a nice property implementation wise, and is somewhat safer. And I'm not sure requiring too much state from the client driver to do basic things is ideal. I can be wrong, but my intuition is that neither MySQL nor PostgreSQL does that because they don't consider it worth the complexity. And that's my intuition too. So I'm fine doing some benchmarking to see if this can make a measurable difference in practice, but I'm -1 on going ahead with this without concrete evidence of the benefits since there is known drawbacks. And I kind of feel it's too late for 2.0.
        Hide
        Jonathan Ellis added a comment -

        This doesn't look negligible at all to me. Here's the metadata encode:

                    public ChannelBuffer encode(Metadata m)
                    {
                        boolean globalTablesSpec = m.flags.contains(Flag.GLOBAL_TABLES_SPEC);
        
                        int stringCount = globalTablesSpec ? 2 + m.names.size() : 3* m.names.size();
                        CBUtil.BufferBuilder builder = new CBUtil.BufferBuilder(1 + m.names.size(), stringCount, 0);
        
                        ChannelBuffer header = ChannelBuffers.buffer(8);
                        header.writeInt(Flag.serialize(m.flags));
                        header.writeInt(m.names.size());
                        builder.add(header);
        
                        if (globalTablesSpec)
                        {
                            builder.addString(m.names.get(0).ksName);
                            builder.addString(m.names.get(0).cfName);
                        }
        
                        for (ColumnSpecification name : m.names)
                        {
                            if (!globalTablesSpec)
                            {
                                builder.addString(name.ksName);
                                builder.addString(name.cfName);
                            }
                            builder.addString(name.toString());
                            builder.add(DataType.codec.encodeOne(DataType.fromType(name.type)));
                        }
                        return builder.build();
                    }
        

        Here's the (per-row) ResultSet encode:

                        for (ByteBuffer bb : row)
                            builder.addValue(bb);
        

        Hmm.

        Seriously, it's trivial to see how you will more often than not have more metadata than row data for typical single-row resultsets. I can put together a wrapper to prove it but it's kind of a waste of time.

        You're right that it's not reasonable to try for 2.0, though. Moved to 2.1.

        Show
        Jonathan Ellis added a comment - This doesn't look negligible at all to me. Here's the metadata encode: public ChannelBuffer encode(Metadata m) { boolean globalTablesSpec = m.flags.contains(Flag.GLOBAL_TABLES_SPEC); int stringCount = globalTablesSpec ? 2 + m.names.size() : 3* m.names.size(); CBUtil.BufferBuilder builder = new CBUtil.BufferBuilder(1 + m.names.size(), stringCount, 0); ChannelBuffer header = ChannelBuffers.buffer(8); header.writeInt(Flag.serialize(m.flags)); header.writeInt(m.names.size()); builder.add(header); if (globalTablesSpec) { builder.addString(m.names.get(0).ksName); builder.addString(m.names.get(0).cfName); } for (ColumnSpecification name : m.names) { if (!globalTablesSpec) { builder.addString(name.ksName); builder.addString(name.cfName); } builder.addString(name.toString()); builder.add(DataType.codec.encodeOne(DataType.fromType(name.type))); } return builder.build(); } Here's the (per-row) ResultSet encode: for (ByteBuffer bb : row) builder.addValue(bb); Hmm. Seriously, it's trivial to see how you will more often than not have more metadata than row data for typical single-row resultsets. I can put together a wrapper to prove it but it's kind of a waste of time. You're right that it's not reasonable to try for 2.0, though. Moved to 2.1.
        Jonathan Ellis made changes -
        Field Original Value New Value
        Fix Version/s 2.1 [ 12324159 ]
        Fix Version/s 2.0 [ 12322954 ]
        Hide
        Jonathan Ellis added a comment - - edited

        P.S. When is globalTablesSpec false? That is, how do we end up w/ a resultset containing data from more than one CF?

        Show
        Jonathan Ellis added a comment - - edited P.S. When is globalTablesSpec false? That is, how do we end up w/ a resultset containing data from more than one CF?
        Hide
        Sylvain Lebresne added a comment -

        That is, how do we end up w/ a resultset containing data from more than one CF?

        We never do it for a ResultSet currently, but we reuse ResultSet.Metadata in responses to prepare messages (to indicate which values has been prepared and need to be passed to execute), and in that case it can happen (with BATCH). And since it doesn't really take any space when globalTableSpec == true, we didn't bothered making a difference for the 2 case.

        Show
        Sylvain Lebresne added a comment - That is, how do we end up w/ a resultset containing data from more than one CF? We never do it for a ResultSet currently, but we reuse ResultSet.Metadata in responses to prepare messages (to indicate which values has been prepared and need to be passed to execute), and in that case it can happen (with BATCH). And since it doesn't really take any space when globalTableSpec == true, we didn't bothered making a difference for the 2 case.
        Hide
        Sylvain Lebresne added a comment -

        I've pushed a patch for this at https://github.com/pcmanus/cassandra/commits/5714_and_5649. The last commit is actually CASSANDRA-5714 and to be honest there is at least 2 lines from that commit that should have been in the previous commit so I really recommend reviewing both changes at the same time. They change similar places of the protocol/code anyway.

        There is 3 changes to the protocol:

        • the result of a prepare ships with the metadata necessary to decode a resultset for that query, if relevant.
        • a new flag can be set in QUERY/EXECUTE messages to request skipping the metadata in the result set returned. I really want this to be optional so that clients implementor can ignore that optimization at first if that's easier (the same way that query paging is optional too).
        • If the flag above was set, the result set returned won't have the column specification parts (which is indicated by a flag of the result set).

        Despite earlier comments, since this is simple enough, I'd rather get that in 2.0 because that requires a protocol change. And I've want a much better reason to get convinced to do a native protocol v3.

        Show
        Sylvain Lebresne added a comment - I've pushed a patch for this at https://github.com/pcmanus/cassandra/commits/5714_and_5649 . The last commit is actually CASSANDRA-5714 and to be honest there is at least 2 lines from that commit that should have been in the previous commit so I really recommend reviewing both changes at the same time. They change similar places of the protocol/code anyway. There is 3 changes to the protocol: the result of a prepare ships with the metadata necessary to decode a resultset for that query, if relevant. a new flag can be set in QUERY/EXECUTE messages to request skipping the metadata in the result set returned. I really want this to be optional so that clients implementor can ignore that optimization at first if that's easier (the same way that query paging is optional too). If the flag above was set, the result set returned won't have the column specification parts (which is indicated by a flag of the result set). Despite earlier comments, since this is simple enough, I'd rather get that in 2.0 because that requires a protocol change. And I've want a much better reason to get convinced to do a native protocol v3.
        Sylvain Lebresne made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Fix Version/s 2.0 beta 1 [ 12322954 ]
        Fix Version/s 2.1 [ 12324159 ]
        Hide
        Sylvain Lebresne added a comment -

        Additional notes on the patch. We only return the result ResultSet in a prepare response in the case of select. Meaning, we don't do it for the result sets return by conditional updates, but in that case we can't predict what the returned columns will be.

        There is also the issue of "SELECT * FROM ...". What if the drop/add a column? After checking, it's actually fine because the set of columns returned in the resultSet is computed during preparation of the statement.

        Lastly, the patch allow the no_metadata flag for QUERY messages too, which is obviously much less useful, but it kind of makes to have it there for symmetry (and it takes no space since it's just a bit flag from a byte the QUERY messages have anyway), and I figured, maybe smart high level clients that generate the query can actually easily compute the metadata of the resultSet (it's not really rocket science) and so it could be useful in the long run.

        Show
        Sylvain Lebresne added a comment - Additional notes on the patch. We only return the result ResultSet in a prepare response in the case of select. Meaning, we don't do it for the result sets return by conditional updates, but in that case we can't predict what the returned columns will be. There is also the issue of "SELECT * FROM ...". What if the drop/add a column? After checking, it's actually fine because the set of columns returned in the resultSet is computed during preparation of the statement. Lastly, the patch allow the no_metadata flag for QUERY messages too, which is obviously much less useful, but it kind of makes to have it there for symmetry (and it takes no space since it's just a bit flag from a byte the QUERY messages have anyway), and I figured, maybe smart high level clients that generate the query can actually easily compute the metadata of the resultSet (it's not really rocket science) and so it could be useful in the long run.
        Jonathan Ellis made changes -
        Reviewer iamaleksey
        Hide
        Aleksey Yeschenko added a comment -

        other that the 0x03 flag issue - lgtm.

        Show
        Aleksey Yeschenko added a comment - other that the 0x03 flag issue - lgtm.
        Hide
        Sylvain Lebresne added a comment -

        Committed (with doc fixed), thanks

        Show
        Sylvain Lebresne added a comment - Committed (with doc fixed), thanks
        Sylvain Lebresne made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Sylvain Lebresne
            Reporter:
            Jonathan Ellis
            Reviewer:
            Aleksey Yeschenko
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development