Uploaded image for project: 'Apache MADlib'
  1. Apache MADlib
  2. MADLIB-1395

Term frequency and LDA - turn off notices

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v1.17
    • Module: Utilities
    • None

    Description

      turn off these notices by using a MinWarning(“Error”) decorator in python

      madlib=# SELECT madlib.term_frequency('documents', -- input table
      madlib(# 'docid', -- document id column
      madlib(# 'words', -- vector of words in document
      madlib(# 'documents_tf', -- output documents table with term frequency
      madlib(# TRUE); -- TRUE to created vocabulary table
      NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry.
      CONTEXT: SQL statement "
       CREATE TABLE documents_tf_vocabulary AS
       SELECT (row_number() OVER (order by word))::INTEGER - 1 as wordid,
       word::TEXT
       FROM (
       SELECT distinct(words) as word
       FROM (
       SELECT unnest(words::TEXT[]) as words
       FROM documents
       ) q1
       ) q2
       "
      PL/Python function "term_frequency"
      NOTICE: One or more columns in the following table(s) do not have statistics: documents
      HINT: For non-partitioned tables, run analyze <table_name>(<column_list>). For partitioned tables, run analyze rootpartition <table_name>(<column_list>). See log for columns missing statistics.
      CONTEXT: SQL statement "
       CREATE TABLE documents_tf_vocabulary AS
       SELECT (row_number() OVER (order by word))::INTEGER - 1 as wordid,
       word::TEXT
       FROM (
       SELECT distinct(words) as word
       FROM (
       SELECT unnest(words::TEXT[]) as words
       FROM documents
       ) q1
       ) q2
       "
      PL/Python function "term_frequency"
      NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'docid' as the Greenplum Database data distribution key for this table.
      HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
      CONTEXT: SQL statement "
       CREATE TABLE documents_tf(
       docid INTEGER,
       wordid INTEGER,
       count INTEGER
       )
       "
      PL/Python function "term_frequency"
      NOTICE: One or more columns in the following table(s) do not have statistics: documents
      HINT: For non-partitioned tables, run analyze <table_name>(<column_list>). For partitioned tables, run analyze rootpartition <table_name>(<column_list>). See log for columns missing statistics.
      CONTEXT: SQL statement "
       INSERT INTO documents_tf
       SELECT docid, w.wordid as wordid, word_count as count
       FROM (
       SELECT docid, word::TEXT, count(*) as word_count
       FROM
       (
       SELECT docid, unnest(words::TEXT[]) as word
       FROM documents
       WHERE
       docid IS NOT NULL
       ) q1
       GROUP BY docid, word
       ) q2
       
       , documents_tf_vocabulary as w
       WHERE
       q2.word = w.word
       
       "
      PL/Python function "term_frequency"
       term_frequency 
      ------------------------------------------------------------------------------------------
       Term frequency output in table documents_tf, vocabulary in table documents_tf_vocabulary
      (1 row)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            nkak Nikhil Kak
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: