Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
When trying to index some Freebase articles, such as:
http://maven.tamingtext.com/freebase-wex-2011-01-18-articles-first10k.tsv
using the SimplePostTool (bin/post), I ran into a few minor things along the way that would help new users trying to get their content indexed.
First, I tried the naive approach:
$ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.tsv
Didn't work ... here's the output:
SimplePostTool: WARNING: Skipping freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto mode.
1 files indexed.
Ummm ... no, 1 files not indexed Instead the output should be something like:
SimplePostTool: WARNING: Skipping freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto mode.
0 of 1 files indexed.
Besides the misleading output, shouldn't tsv be a supported file type for auto-mode? It's a common enough format ...
So I renamed the file to .csv instead and re-ran ... this time I get:
$ mv freebase-wex-2011-01-18-articles-first10k.tsv freebase-wex-2011-01-18-articles-first10k.csv
$ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.csv
ERROR - 2015-01-28 16:24:16.074; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: CSVLoader: input=null, line=1,expected 108 values but got 4
Hmmm ... OK ... did a little Googling and discovered I needed to specify the separator to be %09 (again, the tool should just recognize TSV as a supported format)
bin/post -c freebase -params "separator=%09&escape=\\" ./freebase-wex-2011-01-18-articles-first10k.csv
Success! (of course I had to add a header line to the file too, but there's little we can do about that)