Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7057

SimplePostTool curbside appeal

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • 6.0
    • SimplePostTool
    • None

    Description

      When trying to index some Freebase articles, such as:

      http://maven.tamingtext.com/freebase-wex-2011-01-18-articles-first10k.tsv

      using the SimplePostTool (bin/post), I ran into a few minor things along the way that would help new users trying to get their content indexed.

      First, I tried the naive approach:

      $ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.tsv 
      

      Didn't work ... here's the output:

      SimplePostTool: WARNING: Skipping freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto mode.
      1 files indexed.
      

      Ummm ... no, 1 files not indexed Instead the output should be something like:

      SimplePostTool: WARNING: Skipping freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto mode.
      0 of 1 files indexed.
      

      Besides the misleading output, shouldn't tsv be a supported file type for auto-mode? It's a common enough format ...

      So I renamed the file to .csv instead and re-ran ... this time I get:

      $ mv freebase-wex-2011-01-18-articles-first10k.tsv freebase-wex-2011-01-18-articles-first10k.csv
      $ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.csv
      
      ERROR - 2015-01-28 16:24:16.074; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: CSVLoader: input=null, line=1,expected 108 values but got 4
      

      Hmmm ... OK ... did a little Googling and discovered I needed to specify the separator to be %09 (again, the tool should just recognize TSV as a supported format)

      bin/post -c freebase -params "separator=%09&escape=\\" ./freebase-wex-2011-01-18-articles-first10k.csv
      

      Success! (of course I had to add a header line to the file too, but there's little we can do about that)

      Attachments

        Activity

          People

            ehatcher Erik Hatcher
            thelabdude Timothy Potter
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: