Bigtop
  1. Bigtop
  2. BIGTOP-330

hadoop 0.23 pseudo conf needs to set more properties to avoid using /tmp as its datadir

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.4.0, 0.5.0
    • Fix Version/s: None
    • Component/s: General
    • Labels:
      None

      Description

      [root@localhost conf]# ls /tmp/
      hsperfdata_hdfs hsperfdata_yarn Jetty_0_0_0_0_50075_datanode___hwtdwq Jetty_0_0_0_0_9999_node___7nyhcr nm-local-dir
      hsperfdata_root Jetty_0_0_0_0_50070_hdfs___w2cu08 Jetty_0_0_0_0_8088_cluster___u0rgz3 logs vmware-root

      Some of these directories contain rightfully some temp files, but some others seem to contain some data. They should be in /var/lib/hadoop/...

        Activity

        Hide
        Mark Grover added a comment -

        I tried this with YARN.

        Here are the contents of /tmp directory after running at job at 3:26

        vagrant@lucid64:/tmp$ ls -lrt
        total 72
        drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 01:30 forrest-vagrant
        drwxr-xr-x 4 mapred  mapred  4096 2012-12-08 02:50 Jetty_0_0_0_0_50030_job____yn7qmk
        drwxr-xr-x 2 root    root    4096 2012-12-08 02:50 hsperfdata_root
        drwxr-xr-x 3 hdfs    hdfs    4096 2012-12-08 03:02 hadoop-hdfs
        drwxr-xr-x 5 yarn    yarn    4096 2012-12-08 03:02 nm-local-dir
        drwxr-xr-x 2 yarn    yarn    4096 2012-12-08 03:02 logs
        drwxr-xr-x 3 mapred  mapred  4096 2012-12-08 03:02 hadoop-yarn
        drwxr-xr-x 4 hdfs    hdfs    4096 2012-12-08 03:04 Jetty_0_0_0_0_50090_secondary____y6aanv
        drwxr-xr-x 4 hdfs    hdfs    4096 2012-12-08 03:15 Jetty_0_0_0_0_50070_hdfs____w2cu08
        drwxr-xr-x 4 hdfs    hdfs    4096 2012-12-08 03:16 Jetty_0_0_0_0_50075_datanode____hwtdwq
        drwxr-xr-x 5 yarn    yarn    4096 2012-12-08 03:24 Jetty_0_0_0_0_8088_cluster____u0rgz3
        drwxr-xr-x 5 yarn    yarn    4096 2012-12-08 03:25 Jetty_0_0_0_0_8042_node____19tj0x
        drwxr-xr-x 2 mapred  mapred  4096 2012-12-08 03:25 hsperfdata_mapred
        drwxr-xr-x 5 mapred  mapred  4096 2012-12-08 03:25 Jetty_0_0_0_0_19888_jobhistory____.djq1tw
        drwxr-xr-x 2 hdfs    hdfs    4096 2012-12-08 03:25 hsperfdata_hdfs
        drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:26 hadoop-vagrant
        drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:26 hsperfdata_vagrant
        drwxr-xr-x 2 yarn    yarn    4096 2012-12-08 03:26 hsperfdata_yarn
        

        Then I ran another job at 3:29. Here are the contents of /tmp after that

        vagrant@lucid64:/tmp$ ls -lrt
        total 72
        drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 01:30 forrest-vagrant
        drwxr-xr-x 4 mapred  mapred  4096 2012-12-08 02:50 Jetty_0_0_0_0_50030_job____yn7qmk
        drwxr-xr-x 2 root    root    4096 2012-12-08 02:50 hsperfdata_root
        drwxr-xr-x 3 hdfs    hdfs    4096 2012-12-08 03:02 hadoop-hdfs
        drwxr-xr-x 5 yarn    yarn    4096 2012-12-08 03:02 nm-local-dir
        drwxr-xr-x 2 yarn    yarn    4096 2012-12-08 03:02 logs
        drwxr-xr-x 3 mapred  mapred  4096 2012-12-08 03:02 hadoop-yarn
        drwxr-xr-x 4 hdfs    hdfs    4096 2012-12-08 03:04 Jetty_0_0_0_0_50090_secondary____y6aanv
        drwxr-xr-x 4 hdfs    hdfs    4096 2012-12-08 03:15 Jetty_0_0_0_0_50070_hdfs____w2cu08
        drwxr-xr-x 4 hdfs    hdfs    4096 2012-12-08 03:16 Jetty_0_0_0_0_50075_datanode____hwtdwq
        drwxr-xr-x 5 yarn    yarn    4096 2012-12-08 03:24 Jetty_0_0_0_0_8088_cluster____u0rgz3
        drwxr-xr-x 5 yarn    yarn    4096 2012-12-08 03:25 Jetty_0_0_0_0_8042_node____19tj0x
        drwxr-xr-x 2 mapred  mapred  4096 2012-12-08 03:25 hsperfdata_mapred
        drwxr-xr-x 5 mapred  mapred  4096 2012-12-08 03:25 Jetty_0_0_0_0_19888_jobhistory____.djq1tw
        drwxr-xr-x 2 hdfs    hdfs    4096 2012-12-08 03:25 hsperfdata_hdfs
        drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:29 hsperfdata_vagrant
        drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:29 hadoop-vagrant
        drwxr-xr-x 2 yarn    yarn    4096 2012-12-08 03:29 hsperfdata_yarn
        

        Clearly, the last 3 directories get updated every time a job gets run. They are:

        drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:29 hsperfdata_vagrant
        drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:29 hadoop-vagrant
        drwxr-xr-x 2 yarn    yarn    4096 2012-12-08 03:29 hsperfdata_yarn
        

        The others (listed below) did get created and are related to YARN but don't get updated on every run.

        vagrant@lucid64:/tmp$ ls -lrt
        total 72
        drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 01:30 forrest-vagrant
        drwxr-xr-x 4 mapred  mapred  4096 2012-12-08 02:50 Jetty_0_0_0_0_50030_job____yn7qmk
        drwxr-xr-x 2 root    root    4096 2012-12-08 02:50 hsperfdata_root
        drwxr-xr-x 3 hdfs    hdfs    4096 2012-12-08 03:02 hadoop-hdfs
        drwxr-xr-x 5 yarn    yarn    4096 2012-12-08 03:02 nm-local-dir
        drwxr-xr-x 2 yarn    yarn    4096 2012-12-08 03:02 logs
        drwxr-xr-x 3 mapred  mapred  4096 2012-12-08 03:02 hadoop-yarn
        drwxr-xr-x 4 hdfs    hdfs    4096 2012-12-08 03:04 Jetty_0_0_0_0_50090_secondary____y6aanv
        drwxr-xr-x 4 hdfs    hdfs    4096 2012-12-08 03:15 Jetty_0_0_0_0_50070_hdfs____w2cu08
        drwxr-xr-x 4 hdfs    hdfs    4096 2012-12-08 03:16 Jetty_0_0_0_0_50075_datanode____hwtdwq
        drwxr-xr-x 5 yarn    yarn    4096 2012-12-08 03:24 Jetty_0_0_0_0_8088_cluster____u0rgz3
        drwxr-xr-x 5 yarn    yarn    4096 2012-12-08 03:25 Jetty_0_0_0_0_8042_node____19tj0x
        drwxr-xr-x 2 mapred  mapred  4096 2012-12-08 03:25 hsperfdata_mapred
        drwxr-xr-x 5 mapred  mapred  4096 2012-12-08 03:25 Jetty_0_0_0_0_19888_jobhistory____.djq1tw
        drwxr-xr-x 2 hdfs    hdfs    4096 2012-12-08 03:25 hsperfdata_hdfs
        
        Show
        Mark Grover added a comment - I tried this with YARN. Here are the contents of /tmp directory after running at job at 3:26 vagrant@lucid64:/tmp$ ls -lrt total 72 drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 01:30 forrest-vagrant drwxr-xr-x 4 mapred mapred 4096 2012-12-08 02:50 Jetty_0_0_0_0_50030_job____yn7qmk drwxr-xr-x 2 root root 4096 2012-12-08 02:50 hsperfdata_root drwxr-xr-x 3 hdfs hdfs 4096 2012-12-08 03:02 hadoop-hdfs drwxr-xr-x 5 yarn yarn 4096 2012-12-08 03:02 nm-local-dir drwxr-xr-x 2 yarn yarn 4096 2012-12-08 03:02 logs drwxr-xr-x 3 mapred mapred 4096 2012-12-08 03:02 hadoop-yarn drwxr-xr-x 4 hdfs hdfs 4096 2012-12-08 03:04 Jetty_0_0_0_0_50090_secondary____y6aanv drwxr-xr-x 4 hdfs hdfs 4096 2012-12-08 03:15 Jetty_0_0_0_0_50070_hdfs____w2cu08 drwxr-xr-x 4 hdfs hdfs 4096 2012-12-08 03:16 Jetty_0_0_0_0_50075_datanode____hwtdwq drwxr-xr-x 5 yarn yarn 4096 2012-12-08 03:24 Jetty_0_0_0_0_8088_cluster____u0rgz3 drwxr-xr-x 5 yarn yarn 4096 2012-12-08 03:25 Jetty_0_0_0_0_8042_node____19tj0x drwxr-xr-x 2 mapred mapred 4096 2012-12-08 03:25 hsperfdata_mapred drwxr-xr-x 5 mapred mapred 4096 2012-12-08 03:25 Jetty_0_0_0_0_19888_jobhistory____.djq1tw drwxr-xr-x 2 hdfs hdfs 4096 2012-12-08 03:25 hsperfdata_hdfs drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:26 hadoop-vagrant drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:26 hsperfdata_vagrant drwxr-xr-x 2 yarn yarn 4096 2012-12-08 03:26 hsperfdata_yarn Then I ran another job at 3:29. Here are the contents of /tmp after that vagrant@lucid64:/tmp$ ls -lrt total 72 drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 01:30 forrest-vagrant drwxr-xr-x 4 mapred mapred 4096 2012-12-08 02:50 Jetty_0_0_0_0_50030_job____yn7qmk drwxr-xr-x 2 root root 4096 2012-12-08 02:50 hsperfdata_root drwxr-xr-x 3 hdfs hdfs 4096 2012-12-08 03:02 hadoop-hdfs drwxr-xr-x 5 yarn yarn 4096 2012-12-08 03:02 nm-local-dir drwxr-xr-x 2 yarn yarn 4096 2012-12-08 03:02 logs drwxr-xr-x 3 mapred mapred 4096 2012-12-08 03:02 hadoop-yarn drwxr-xr-x 4 hdfs hdfs 4096 2012-12-08 03:04 Jetty_0_0_0_0_50090_secondary____y6aanv drwxr-xr-x 4 hdfs hdfs 4096 2012-12-08 03:15 Jetty_0_0_0_0_50070_hdfs____w2cu08 drwxr-xr-x 4 hdfs hdfs 4096 2012-12-08 03:16 Jetty_0_0_0_0_50075_datanode____hwtdwq drwxr-xr-x 5 yarn yarn 4096 2012-12-08 03:24 Jetty_0_0_0_0_8088_cluster____u0rgz3 drwxr-xr-x 5 yarn yarn 4096 2012-12-08 03:25 Jetty_0_0_0_0_8042_node____19tj0x drwxr-xr-x 2 mapred mapred 4096 2012-12-08 03:25 hsperfdata_mapred drwxr-xr-x 5 mapred mapred 4096 2012-12-08 03:25 Jetty_0_0_0_0_19888_jobhistory____.djq1tw drwxr-xr-x 2 hdfs hdfs 4096 2012-12-08 03:25 hsperfdata_hdfs drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:29 hsperfdata_vagrant drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:29 hadoop-vagrant drwxr-xr-x 2 yarn yarn 4096 2012-12-08 03:29 hsperfdata_yarn Clearly, the last 3 directories get updated every time a job gets run. They are: drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:29 hsperfdata_vagrant drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 03:29 hadoop-vagrant drwxr-xr-x 2 yarn yarn 4096 2012-12-08 03:29 hsperfdata_yarn The others (listed below) did get created and are related to YARN but don't get updated on every run. vagrant@lucid64:/tmp$ ls -lrt total 72 drwxr-xr-x 2 vagrant vagrant 4096 2012-12-08 01:30 forrest-vagrant drwxr-xr-x 4 mapred mapred 4096 2012-12-08 02:50 Jetty_0_0_0_0_50030_job____yn7qmk drwxr-xr-x 2 root root 4096 2012-12-08 02:50 hsperfdata_root drwxr-xr-x 3 hdfs hdfs 4096 2012-12-08 03:02 hadoop-hdfs drwxr-xr-x 5 yarn yarn 4096 2012-12-08 03:02 nm-local-dir drwxr-xr-x 2 yarn yarn 4096 2012-12-08 03:02 logs drwxr-xr-x 3 mapred mapred 4096 2012-12-08 03:02 hadoop-yarn drwxr-xr-x 4 hdfs hdfs 4096 2012-12-08 03:04 Jetty_0_0_0_0_50090_secondary____y6aanv drwxr-xr-x 4 hdfs hdfs 4096 2012-12-08 03:15 Jetty_0_0_0_0_50070_hdfs____w2cu08 drwxr-xr-x 4 hdfs hdfs 4096 2012-12-08 03:16 Jetty_0_0_0_0_50075_datanode____hwtdwq drwxr-xr-x 5 yarn yarn 4096 2012-12-08 03:24 Jetty_0_0_0_0_8088_cluster____u0rgz3 drwxr-xr-x 5 yarn yarn 4096 2012-12-08 03:25 Jetty_0_0_0_0_8042_node____19tj0x drwxr-xr-x 2 mapred mapred 4096 2012-12-08 03:25 hsperfdata_mapred drwxr-xr-x 5 mapred mapred 4096 2012-12-08 03:25 Jetty_0_0_0_0_19888_jobhistory____.djq1tw drwxr-xr-x 2 hdfs hdfs 4096 2012-12-08 03:25 hsperfdata_hdfs
        Hide
        Mark Grover added a comment -

        Let's first talk about the 3 files that get updated with every job:
        The hsperfdata files are generated by JVM and can be gotten rid of by -XX:-UsePerfData (reference: http://stackoverflow.com/questions/76327/how-can-i-prevent-java-from-creating-hsperfdata-files). However, I think they belong in /tmp and should stay there.

        The remaining file (hadoop-$user.name) is hadoop.tmp.dir which is in turn used by the default value of some other properties.

        Show
        Mark Grover added a comment - Let's first talk about the 3 files that get updated with every job: The hsperfdata files are generated by JVM and can be gotten rid of by -XX:-UsePerfData (reference: http://stackoverflow.com/questions/76327/how-can-i-prevent-java-from-creating-hsperfdata-files ). However, I think they belong in /tmp and should stay there. The remaining file (hadoop-$user.name) is hadoop.tmp.dir which is in turn used by the default value of some other properties.
        Hide
        Mark Grover added a comment -

        Bumping down the priority from blocker for now.

        Show
        Mark Grover added a comment - Bumping down the priority from blocker for now.

          People

          • Assignee:
            Mark Grover
            Reporter:
            Bruno Mahé
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development