When Extracting text using Apache Tika 1.4, the Text is getting duplicated.
APACHE_TIKA_PATH = os.path.abspath(os.path.join(PROJECT_ROOT, apache_tika/tika-app-1.4.jar'))
sout = subprocess.check_output("java -jar %s -t %s"%(APACHE_TIKA_PATH, document),shell=True)
sout contains duplicate text.
Issue both for Doc and PDF files.