1. There will be no map reduce. This will all be client side (i.e flume agents) streaming data in parallel into HCatalog. Clients will compute the specific partition into which the data will be written. Periodically (configurable) they would 'commit' the currently open partition and roll-over to a new partition. Until the partition is committed the data will not be query-able. There is one restriction... once a partition is committed its data cannot be modified it.
3. I have not verified the secure mode HCat operation, but it appears to be supported. Will get back to you.
4. At the moment, I dont see much code overlap with HDFS sink for the core data movement functionality. There may be always room for sharing other smaller tidbits.