Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1209

Pluggable line (de)serializer for DelimitedTextFile

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: Storage
    • Labels:
      None

      Description

      DelimitedTextFile directly parses line delimited text files and parses each line into CSV or TSV field. It has many limits when we deal with custom text-based file format.

      This patch enables DelimitedTextFile to use a pluggable line (de) serializer.

      First of all, I add an abstract class for user-defined line serde class as follows:

      public abstract class TextLineSerde {
        protected Schema schema;
        protected TableMeta meta;
        protected int [] targetColumnIndexes;
      
        public TextLineSerde(Schema schema, TableMeta meta, int[] targetColumnIndexes) {
          this.schema = schema;
          this.meta = meta;
          this.targetColumnIndexes = targetColumnIndexes;
        }
      
        public abstract void init();
      
        public abstract void buildTuple(final ByteBuf buf, Tuple tuple) throws IOException;
      
        public abstract void release();
      }
      

      I also added a table property text.serde.class which allows users to specify a custom line serder. This table property affects only TEXT file format. You can specify your own line serder as follows:

      CREATE XXX (x int, y int) USING TEXT WITH ('text.serde.class' = 'org.apache.tajo.storage.text.CSVLineSerde')
      

        Attachments

          Activity

            People

            • Assignee:
              hyunsik Hyunsik Choi
              Reporter:
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: