Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-5525

CSVRecordReader fails with StringIndexOutOfBoundsException when field is a double quote

    Details

      Description

      Bug description:

      When trying to parse a CSV file given in RFC4180 format and one of its fields is a double quote, CSVRecordReader fails with the following exception:

      java.lang.StringIndexOutOfBoundsException: String index out of range: -1

      at java.lang.String.substring(String.java:1967)
      at org.apache.nifi.csv.AbstractCSVRecordReader.convert(AbstractCSVRecordReader.java:82)
      at org.apache.nifi.csv.CSVRecordReader.nextRecord(CSVRecordReader.java:102)
      at org.apache.nifi.serialization.RecordReader.nextRecord(RecordReader.java:50)
      at org.apache.nifi.csv.TestCSVRecordReader.testQuote(TestCSVRecordReader.java:610)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
      at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
      at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
      at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
      at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
      at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
      at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
      at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
      at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
      at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
      at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
      at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
      at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
      at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

       

      Note, that according to RFC4180:

       
      If double-quotes are used to enclose fields, then a double-quote
      appearing inside a field must be escaped by preceding it with
      another double quote.
      https://tools.ietf.org/html/rfc4180#page-2

       

      Then a field whose value is a double quote character would be encoded like this:

      """"

      (4 double quote characters)  

      How to reproduce

      Add the following method to TestCSVRecordReader.java and run the test:

       

      @Test
      public void testQuote() throws IOException, MalformedRecordException {
      final CSVFormat format = CSVFormat.RFC4180.withFirstRecordAsHeader().withTrim().withQuote('"');
      final String text = "\"name\"\n\"\"\"\"";
      
      final List<RecordField> fields = new ArrayList<>();
      fields.add(new RecordField("name", RecordFieldType.STRING.getDataType()));
      final RecordSchema schema = new SimpleRecordSchema(fields);
      
      try (final InputStream bais = new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8));
      final CSVRecordReader reader = new CSVRecordReader(bais, Mockito.mock(ComponentLog.class), schema, format, true, false,
      RecordFieldType.DATE.getDefaultFormat(), RecordFieldType.TIME.getDefaultFormat(), RecordFieldType.TIMESTAMP.getDefaultFormat(), StandardCharsets.UTF_8.name())) {
      
      final Record record = reader.nextRecord();
      final String name = (String)record.getValue("name");
      
      assertEquals("\"", name);
      }
      }
      
      

       

       

        Attachments

          Activity

            People

            • Assignee:
              pvillard Pierre Villard
              Reporter:
              vadimar Vadim
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: