Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-1209

Pluggable line (de)serializer for DelimitedTextFile

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.0
    • Storage
    • None

    Description

      DelimitedTextFile directly parses line delimited text files and parses each line into CSV or TSV field. It has many limits when we deal with custom text-based file format.

      This patch enables DelimitedTextFile to use a pluggable line (de) serializer.

      First of all, I add an abstract class for user-defined line serde class as follows:

      public abstract class TextLineSerde {
        protected Schema schema;
        protected TableMeta meta;
        protected int [] targetColumnIndexes;
      
        public TextLineSerde(Schema schema, TableMeta meta, int[] targetColumnIndexes) {
          this.schema = schema;
          this.meta = meta;
          this.targetColumnIndexes = targetColumnIndexes;
        }
      
        public abstract void init();
      
        public abstract void buildTuple(final ByteBuf buf, Tuple tuple) throws IOException;
      
        public abstract void release();
      }
      

      I also added a table property text.serde.class which allows users to specify a custom line serder. This table property affects only TEXT file format. You can specify your own line serder as follows:

      CREATE XXX (x int, y int) USING TEXT WITH ('text.serde.class' = 'org.apache.tajo.storage.text.CSVLineSerde')
      

      Attachments

        Activity

          People

            hyunsik Hyunsik Choi
            hyunsik Hyunsik Choi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: