Currently, LargeObjectBlob (LOB) is handled in Sqoop as follows:
1) if LOB is < MAX_INLINE_LOB_LEN (16 MB by default), it is imported as text (or sequence) files just like any other types of data.
2) if LOB is > MAX_INLINE_LOB_LEN, it is saved in .lob files, and reference files that contain information about these .lob files are created. The content of reference files looks like this:
But if the --as-sqeuncefile option is enabled, Sqoop generates reference files as sequence files while LOB is still saved in .lob files. (The .lob file format specification can be found at https://github.com/cloudera/sqoop/wiki/sip-3.)
As the first step of blob support for Avro import, I am going to follow the current semantics of --as-sequencefile. That is, if the --as-avrodatafile option is enabled,
1) if LOB is < MAX_INLINE_LOB_LEN, it will be saved as Avro data files.
2) if LOB is > MAX_INLINE_LOB_LEN, reference files will be generated as Avro data files while LOB is still saved in .lob files.