Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1578

Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block

    Details

      Description

      If a text file with \n\r pair as line termination character is split in different HDFS blocks between \n and \r character (i.e. \n will be the last character of one block and \r will be the first one of the other block), Impala will see the line following the \n\r two times.
      This result in duplicating this line, both in an aggregation query and in a select * query.

        Attachments

          Activity

            People

            • Assignee:
              skye Skye Wanderman-Milne
              Reporter:
              simobatt@gmail.com Simone
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: