[MADLIB-1395] Term frequency and LDA - turn off notices - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: v1.17
Component/s: Module: Utilities
Labels:
None

Description

turn off these notices by using a MinWarning(“Error”) decorator in python

madlib=# SELECT madlib.term_frequency('documents', -- input table
madlib(# 'docid', -- document id column
madlib(# 'words', -- vector of words in document
madlib(# 'documents_tf', -- output documents table with term frequency
madlib(# TRUE); -- TRUE to created vocabulary table
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry.
CONTEXT: SQL statement "
 CREATE TABLE documents_tf_vocabulary AS
 SELECT (row_number() OVER (order by word))::INTEGER - 1 as wordid,
 word::TEXT
 FROM (
 SELECT distinct(words) as word
 FROM (
 SELECT unnest(words::TEXT[]) as words
 FROM documents
 ) q1
 ) q2
 "
PL/Python function "term_frequency"
NOTICE: One or more columns in the following table(s) do not have statistics: documents
HINT: For non-partitioned tables, run analyze <table_name>(<column_list>). For partitioned tables, run analyze rootpartition <table_name>(<column_list>). See log for columns missing statistics.
CONTEXT: SQL statement "
 CREATE TABLE documents_tf_vocabulary AS
 SELECT (row_number() OVER (order by word))::INTEGER - 1 as wordid,
 word::TEXT
 FROM (
 SELECT distinct(words) as word
 FROM (
 SELECT unnest(words::TEXT[]) as words
 FROM documents
 ) q1
 ) q2
 "
PL/Python function "term_frequency"
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'docid' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CONTEXT: SQL statement "
 CREATE TABLE documents_tf(
 docid INTEGER,
 wordid INTEGER,
 count INTEGER
 )
 "
PL/Python function "term_frequency"
NOTICE: One or more columns in the following table(s) do not have statistics: documents
HINT: For non-partitioned tables, run analyze <table_name>(<column_list>). For partitioned tables, run analyze rootpartition <table_name>(<column_list>). See log for columns missing statistics.
CONTEXT: SQL statement "
 INSERT INTO documents_tf
 SELECT docid, w.wordid as wordid, word_count as count
 FROM (
 SELECT docid, word::TEXT, count(*) as word_count
 FROM
 (
 SELECT docid, unnest(words::TEXT[]) as word
 FROM documents
 WHERE
 docid IS NOT NULL
 ) q1
 GROUP BY docid, word
 ) q2
 
 , documents_tf_vocabulary as w
 WHERE
 q2.word = w.word
 
 "
PL/Python function "term_frequency"
 term_frequency 
------------------------------------------------------------------------------------------
 Term frequency output in table documents_tf, vocabulary in table documents_tf_vocabulary
(1 row)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nikhil Kak

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Nov/19 23:46

Updated:: 18/Dec/19 23:33

Resolved:: 18/Dec/19 23:33