Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
4.7
-
None
Description
As already mentioned repeatedly and at length, this is a regression introduced by the fix in https://issues.apache.org/jira/browse/SOLR-5605
Here is the diff of --help output before SOLR-5605 vs after SOLR-5605:
130,235c130 < lucene segments left in this index. Merging < segments involves reading and rewriting all data < in all these segment files, potentially multiple < times, which is very I/O intensive and time < consuming. However, an index with fewer segments < can later be merged faster, and it can later be < queried faster once deployed to a live Solr < serving shard. Set maxSegments to 1 to optimize < the index for low query latency. In a nutshell, a < small maxSegments value trades indexing latency < for subsequently improved query latency. This can < be a reasonable trade-off for batch indexing < systems. (default: 1) < --fair-scheduler-pool STRING < Optional tuning knob that indicates the name of < the fair scheduler pool to submit jobs to. The < Fair Scheduler is a pluggable MapReduce scheduler < that provides a way to share large clusters. Fair < scheduling is a method of assigning resources to < jobs such that all jobs get, on average, an equal < share of resources over time. When there is a < single job running, that job uses the entire < cluster. When other jobs are submitted, tasks < slots that free up are assigned to the new jobs, < so that each job gets roughly the same amount of < CPU time. Unlike the default Hadoop scheduler, < which forms a queue of jobs, this lets short jobs < finish in reasonable time while not starving long < jobs. It is also an easy way to share a cluster < between multiple of users. Fair sharing can also < work with job priorities - the priorities are < used as weights to determine the fraction of < total compute time that each job gets. < --dry-run Run in local mode and print documents to stdout < instead of loading them into Solr. This executes < the morphline in the client process (without < submitting a job to MR) for quicker turnaround < during early trial & debug sessions. (default: < false) < --log4j FILE Relative or absolute path to a log4j.properties < config file on the local file system. This file < will be uploaded to each MR task. Example: < /path/to/log4j.properties < --verbose, -v Turn on verbose output. (default: false) < --show-non-solr-cloud Also show options for Non-SolrCloud mode as part < of --help. (default: false) < < Required arguments: < --output-dir HDFS_URI HDFS directory to write Solr indexes to. Inside < there one output directory per shard will be < generated. Example: hdfs://c2202.mycompany. < com/user/$USER/test < --morphline-file FILE Relative or absolute path to a local config file < that contains one or more morphlines. The file < must be UTF-8 encoded. Example: < /path/to/morphline.conf < < Cluster arguments: < Arguments that provide information about your Solr cluster. < < --zk-host STRING The address of a ZooKeeper ensemble being used by < a SolrCloud cluster. This ZooKeeper ensemble will < be examined to determine the number of output < shards to create as well as the Solr URLs to < merge the output shards into when using the --go- < live option. Requires that you also pass the -- < collection to merge the shards into. < < The --zk-host option implements the same < partitioning semantics as the standard SolrCloud < Near-Real-Time (NRT) API. This enables to mix < batch updates from MapReduce ingestion with < updates from standard Solr NRT ingestion on the < same SolrCloud cluster, using identical unique < document keys. < < Format is: a list of comma separated host:port < pairs, each corresponding to a zk server. < Example: '127.0.0.1:2181,127.0.0.1:2182,127.0.0.1: < 2183' If the optional chroot suffix is used the < example would look like: '127.0.0.1:2181/solr, < 127.0.0.1:2182/solr,127.0.0.1:2183/solr' where < the client would be rooted at '/solr' and all < paths would be relative to this root - i.e. < getting/setting/etc... '/foo/bar' would result in < operations being run on '/solr/foo/bar' (from the < server perspective). < < < Go live arguments: < Arguments for merging the shards that are built into a live Solr < cluster. Also see the Cluster arguments. < < --go-live Allows you to optionally merge the final index < shards into a live Solr cluster after they are < built. You can pass the ZooKeeper address with -- < zk-host and the relevant cluster information will < be auto detected. (default: false) < --collection STRING The SolrCloud collection to merge shards into < when using --go-live and --zk-host. Example: < collection1 < --go-live-threads INTEGER < Tuning knob that indicates the maximum number of < live merges to run in parallel at one time. < (default: 1000) < --- >
As already mentioned repeatedly and at length, this bug is because there's a change related to buffer flushing in argparse4 >= 0.4.2.
The fix is to apply CDH-16434 to MapReduceIndexerTool.java as follows:
- parser.printHelp(new PrintWriter(System.out)); + parser.printHelp();
Attachments
Issue Links
- duplicates
-
SOLR-5782 The full MapReduceIndexer help text does not display when using --help.
- Closed