Solr
  1. Solr
  2. SOLR-5213

collections?action=SPLITSHARD parent vs. sub-shards numDocs

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.4
    • Fix Version/s: 5.2
    • Component/s: update
    • Labels:
      None

      Description

      The problem we saw was that splitting a shard took a long time and at the end of it the sub-shards contained fewer documents than the original shard.

      The root cause was eventually tracked down to the disappearing documents not falling into the hash ranges of the sub-shards.

      Could SolrIndexSplitter split report per-segment numDocs for parent and sub-shards, with at least a warning logged for any discrepancies (documents falling into none of the sub-shards or documents falling into several sub-shards)?

      Additionally, could a case be made for erroring out when discrepancies are detected i.e. not proceeding with the shard split? Either to always error or to have an verifyNumDocs=false/true optional parameter for the SPLITSHARD action.

      1. SOLR-5213.patch
        2 kB
        Ramkumar Aiyengar
      2. SOLR-5213.patch
        2 kB
        Christine Poerschke

        Activity

        Hide
        Christine Poerschke added a comment -

        Attaching patch for reporting per-segment numDocs for parent and sub-shards.

        Show
        Christine Poerschke added a comment - Attaching patch for reporting per-segment numDocs for parent and sub-shards.
        Hide
        Shalin Shekhar Mangar added a comment -

        The root cause was eventually tracked down to the disappearing documents not falling into the hash ranges of the sub-shards.

        Did you investigate how that was possible? It sounds like a bug in the hashing code either in the routing side or the splitting side. A document can never belong to multiple shard ranges because the partitioning code produces disjoint ranges.

        Show
        Shalin Shekhar Mangar added a comment - The root cause was eventually tracked down to the disappearing documents not falling into the hash ranges of the sub-shards. Did you investigate how that was possible? It sounds like a bug in the hashing code either in the routing side or the splitting side. A document can never belong to multiple shard ranges because the partitioning code produces disjoint ranges.
        Hide
        Christine Poerschke added a comment -

        Two occurrences of lost documents were seen. The one with the majority of documents lost was tracked down to operational error (shardX files were copied to be shardY files), a second loss was of a few dozen documents only, for that never figured out if it was operational or something else. Other shard splits since then were fine i.e. no losses.

        Show
        Christine Poerschke added a comment - Two occurrences of lost documents were seen. The one with the majority of documents lost was tracked down to operational error (shardX files were copied to be shardY files), a second loss was of a few dozen documents only, for that never figured out if it was operational or something else. Other shard splits since then were fine i.e. no losses.
        Hide
        Shalin Shekhar Mangar added a comment -

        I'm seeing similar problems as well on the ShardSplitTest sporadically. I've opened SOLR-5309 to track it.

        I'll review and commit your patch shortly.

        Show
        Shalin Shekhar Mangar added a comment - I'm seeing similar problems as well on the ShardSplitTest sporadically. I've opened SOLR-5309 to track it. I'll review and commit your patch shortly.
        Hide
        Christine Poerschke added a comment -

        A variation of the patch i uploaded here would be to 'rescue' (and id+hash log) any documents that would have been lost otherwise e.g. always put them in the first sub-shard, they don't belong there but at least that way they are not lost and could be analysed and dealt with later on.

        Show
        Christine Poerschke added a comment - A variation of the patch i uploaded here would be to 'rescue' (and id+hash log) any documents that would have been lost otherwise e.g. always put them in the first sub-shard, they don't belong there but at least that way they are not lost and could be analysed and dealt with later on.
        Hide
        Shalin Shekhar Mangar added a comment -

        A variation of the patch i uploaded here would be to 'rescue' (and id+hash log) any documents that would have been lost otherwise e.g. always put them in the first sub-shard, they don't belong there but at least that way they are not lost and could be analysed and dealt with later on.

        Hmm, that is going to be difficult because we have features such as SOLR-5338. It is completely valid to have documents that do not fall into any hash range passed into SolrIndexSplitter.

        Show
        Shalin Shekhar Mangar added a comment - A variation of the patch i uploaded here would be to 'rescue' (and id+hash log) any documents that would have been lost otherwise e.g. always put them in the first sub-shard, they don't belong there but at least that way they are not lost and could be analysed and dealt with later on. Hmm, that is going to be difficult because we have features such as SOLR-5338 . It is completely valid to have documents that do not fall into any hash range passed into SolrIndexSplitter.
        Hide
        Ramkumar Aiyengar added a comment - - edited

        Shalin, any objection to this patch going in? May be with SOLR-5338, the severity of the 0 shard case can be reduced from log.error (alternatively, it could check for split.key being present and decide severity if we want to be smarter), but the patch should good otherwise..

        Show
        Ramkumar Aiyengar added a comment - - edited Shalin, any objection to this patch going in? May be with SOLR-5338 , the severity of the 0 shard case can be reduced from log.error (alternatively, it could check for split.key being present and decide severity if we want to be smarter), but the patch should good otherwise..
        Hide
        Shalin Shekhar Mangar added a comment -

        Yes, this can go in. I'll commit it.

        Show
        Shalin Shekhar Mangar added a comment - Yes, this can go in. I'll commit it.
        Hide
        Ramkumar Aiyengar added a comment -

        Brought this up to date and fixed a bug when ranges is null..

        Show
        Ramkumar Aiyengar added a comment - Brought this up to date and fixed a bug when ranges is null..
        Hide
        ASF subversion and git services added a comment -

        Commit 1676075 from Ramkumar Aiyengar in branch 'dev/trunk'
        [ https://svn.apache.org/r1676075 ]

        SOLR-5213: Log when shard splitting unexpectedly leads to documents going to zero or multiple sub-shards

        Show
        ASF subversion and git services added a comment - Commit 1676075 from Ramkumar Aiyengar in branch 'dev/trunk' [ https://svn.apache.org/r1676075 ] SOLR-5213 : Log when shard splitting unexpectedly leads to documents going to zero or multiple sub-shards
        Hide
        ASF subversion and git services added a comment -

        Commit 1676076 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1676076 ]

        SOLR-5213: Log when shard splitting unexpectedly leads to documents going to zero or multiple sub-shards

        Show
        ASF subversion and git services added a comment - Commit 1676076 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1676076 ] SOLR-5213 : Log when shard splitting unexpectedly leads to documents going to zero or multiple sub-shards
        Hide
        Ramkumar Aiyengar added a comment -

        Thanks Christine!

        Show
        Ramkumar Aiyengar added a comment - Thanks Christine!
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            Ramkumar Aiyengar
            Reporter:
            Christine Poerschke
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development