Description
DelimitedTextFile directly parses line delimited text files and parses each line into CSV or TSV field. It has many limits when we deal with custom text-based file format.
This patch enables DelimitedTextFile to use a pluggable line (de) serializer.
First of all, I add an abstract class for user-defined line serde class as follows:
public abstract class TextLineSerde { protected Schema schema; protected TableMeta meta; protected int [] targetColumnIndexes; public TextLineSerde(Schema schema, TableMeta meta, int[] targetColumnIndexes) { this.schema = schema; this.meta = meta; this.targetColumnIndexes = targetColumnIndexes; } public abstract void init(); public abstract void buildTuple(final ByteBuf buf, Tuple tuple) throws IOException; public abstract void release(); }
I also added a table property text.serde.class which allows users to specify a custom line serder. This table property affects only TEXT file format. You can specify your own line serder as follows:
CREATE XXX (x int, y int) USING TEXT WITH ('text.serde.class' = 'org.apache.tajo.storage.text.CSVLineSerde')