Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: SimplePostTool
    • Labels:
      None

      Description

      When trying to index some Freebase articles, such as:

      http://maven.tamingtext.com/freebase-wex-2011-01-18-articles-first10k.tsv

      using the SimplePostTool (bin/post), I ran into a few minor things along the way that would help new users trying to get their content indexed.

      First, I tried the naive approach:

      $ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.tsv 
      

      Didn't work ... here's the output:

      SimplePostTool: WARNING: Skipping freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto mode.
      1 files indexed.
      

      Ummm ... no, 1 files not indexed Instead the output should be something like:

      SimplePostTool: WARNING: Skipping freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto mode.
      0 of 1 files indexed.
      

      Besides the misleading output, shouldn't tsv be a supported file type for auto-mode? It's a common enough format ...

      So I renamed the file to .csv instead and re-ran ... this time I get:

      $ mv freebase-wex-2011-01-18-articles-first10k.tsv freebase-wex-2011-01-18-articles-first10k.csv
      $ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.csv
      
      ERROR - 2015-01-28 16:24:16.074; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: CSVLoader: input=null, line=1,expected 108 values but got 4
      

      Hmmm ... OK ... did a little Googling and discovered I needed to specify the separator to be %09 (again, the tool should just recognize TSV as a supported format)

      bin/post -c freebase -params "separator=%09&escape=\\" ./freebase-wex-2011-01-18-articles-first10k.csv
      

      Success! (of course I had to add a header line to the file too, but there's little we can do about that)

        Attachments

          Activity

            People

            • Assignee:
              ehatcher Erik Hatcher
              Reporter:
              thelabdude Timothy Potter
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: