Mike, I disagree.
Here is my reasoning: Lucene Java happens to use U+FFFF as an internal identifier for processing.
However, this is your choice, you could have just as easily used U+FFFE, or some other codepoint even outside the BMP for this purpose. The standard gives you several options, perhaps you might need multiple process-internal characters to accomplish what you want to do internally.
Its my understanding Lucene indexes should be portable to different programming languages: perhaps my implementation in C/perl/python decides to use a different process-internal character, this is allowed by Unicode and I think we should adhere to it, I don't think its being anal.
Finally, I completely disagree with the nontrivial performance comment. The trick is to make sure the execution branch / checks for the process-internal characters outside the bmp, only occurs for surrogate pairs. They are statistically very rare and if done right, it will not affect performance of BMP content.