Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
Apache Gobblin 170724, Apache Gobblin 170807, Apache Gobblin 170821, Apache Gobblin 170905
Description
It'd be useful to support configuration properties to override the default username when connecting to a HDFS cluster, e.g. in the HDFS writers. The system username that owns the Gobblin process is used by default.
One particular use case for this is for stand-alone Gobblin instances running as the `root` system user within a Docker container. Individual users within an organization employing a stand-alone Gobblin cluster for data integration needs across multiple teams may have multiple users submitting jobs meant to touch different parts of the HDFS namespace under the control of separate users.
Note that this feature is not quite security-relevant, as this would still allow any job configuration file to specify any username, so there aren't any enforced privilege boundaries anyway.
One solution that does not appear to work is to specify the `hadoop.job.ugi` property in a job configuration file, despite what this appears to suggest in [FsDataWriter.java](https://github.com/linkedin/gobblin/blob/7141ec88c255c8c3cbc7054fb8146eebe77fc09d/gobblin-core/src/main/java/gobblin/writer/FsDataWriter.java#L88-L91):
```java
Configuration conf = new Configuration();
// Add all job configuration properties so they are picked up by Hadoop
JobConfigurationUtils.putStateIntoConfiguration(properties, conf);
this.fs = WriterUtils.getWriterFS(properties, this.numBranches, this.branchId);
```
Github Url : https://github.com/linkedin/gobblin/issues/1904
Github Reporter : mgomezch
Github Created At : 2017-05-26T18:58:16Z
Github Updated At : 2017-05-26T18:58:16Z
Attachments
Issue Links
- is cloned by
-
GOBBLIN-716 Add lineage in FileBasedSource
- Resolved