> let's move the clear() to an else clause for the preceding if().
> setStructField in Reflection and ThriftStructObjectInspector - is this used anywhere? what's the motivation behind this
- please remove references to TypeInfo in ObjectInspector.java comments and explain differently.
changed to type.
> the MetadataListStructObjectInspector.getStructFieldData looks pretty high overhead to me. we have gone to so much trouble to avoid creating objectinspectors - but those are just one time per map/reduce instance. but the getField() type of calls are per row. creating a list from an array type per evaluation seems unnecessary - we should be able to get directly from the backing array. there are quite a few function calls as well (nested function calls and class equality checks and so on).
The purpose of MetadataListStructObjectInspector is mainly backward compatibility - it can be thrown away once DynamicSerDe is out, so I am not sure whether it's worth to optimize it a lot or should we focus more on DynamicSerDe.
And by the way, the code for dealing with Arrays in StandardStructObjectInspector never run for MetadataListStructObjectInspector - because MetadataListStructObjectInspector is always inspectoring columnSet which has a member col which is an ArrayList. So we won't create a List per row.
> minor: getCategory() calls in Standard* can be marked final.
> One thing that i found somewhat complicated is the way the ObjectInspectorFactory() is written. It sounds like this would be the factory for most objectinspectors - but it's constructed to be only for reflection and reflection derived ones. in particular - the metadatatyped... class has it's own caching and factory like methods. You might want to think about structuring this more cleanly (instead of 'Type' - there could be a more generic concept of a signature and inspector family and a cache per type X family).
This can be done in a separate jira if we find it absolutely necessary. In most cases, authors of a new SerDe will create an instance of the reflection-based object inspector, or create an instance using all the standard object inspectors. What we have now is sufficient for that. I agree the current factory structure is not optimal for authors of new ObjectInspectors, but we won't have such a need till we start to write lazy-deserialized SerDes.
The impact of this jira (and the subsequent execution code refactoring) is probably much bigger than this small change. Let's try to get these big things in for 0.19 and do small fixes later?
I will submit a diff with the other changes.