weixin_40005542
weixin_40005542
2021-01-09 07:53

Locating .java_pid in Aurora/Mesos

First of all, a huge shout out for making async-profiler so awesome! We've been using it quite successfully in our dedicated Mesos pools (#147 made it super easy!). As of 1.5, I was eager to try the new --all-user option, which opens so many possibilities for us! Now we can run it in a shared Mesos pool! Well, must be able to.

I did some experimenting today and I think, jattach having a little hard time locating a java_pid file for my process. Here are some details:

Worth mentioning that before I run jattach the first time, there is no java_pid file in the process tmp (which conveniently located one level up). 495647 is my process (I'm, csl-perf, the owner).


sandbox $ ls -la ../tmp/
total 964
drwxrwxrwt 3 root     root               54 Feb  8 19:59 .
drwxr-xr-x 7 root     root             4096 Feb  8 19:59 ..
drwxr-xr-x 2 csl-perf serviceaccount     19 Feb  8 19:59 hsperfdata_csl-perf
-rw-r--r-- 1 csl-perf serviceaccount 605737 Feb  8 20:00 perf-495647.map

After a first unsuccessful attempt of jattach, the pid file is being created. I understand that jattach has a retry logic in it, and if it able to locate a pid-file, it should just attach.


sandbox $ ./async-profiler/profiler.sh -d 10 -f out.txt 495647
Could not start attach mechanism: No such file or directory
sandbox $ ls -la ../tmp/
total 964
drwxrwxrwt 3 root     root               76 Feb  8 20:00 .
drwxr-xr-x 7 root     root             4096 Feb  8 19:59 ..
drwxr-xr-x 2 csl-perf serviceaccount     19 Feb  8 19:59 hsperfdata_csl-perf
srw------- 1 csl-perf serviceaccount      0 Feb  8 20:00 .java_pid495647
-rw-r--r-- 1 csl-perf serviceaccount 621647 Feb  8 20:00 perf-495647.map

Some details:


sandbox $ lsof -p 495647 | grep java_pid
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing

sandbox $ readlink -f /proc/495647/root/tmp
/tmp

sandbox $ ls -la /proc/495647/fd | grep java_pid // nothing there

And strace jattach:


async-profiler $ strace build/jattach 495647 load build/libasyncProfiler.so true start,event=alloc,file=t.svg,svg
execve("build/jattach", ["build/jattach", "495647", "load", "build/libasyncProfiler.so", "true", "start,event=alloc,file=t.svg,svg"], [/* 25 vars */]) = 0
brk(NULL)                               = 0x1e21000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd06a0c7000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=56806, ...}) = 0
mmap(NULL, 56806, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd06a0b9000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P%\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2173512, ...}) = 0
mmap(NULL, 3981792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fd069ada000
mprotect(0x7fd069c9d000, 2093056, PROT_NONE) = 0
mmap(0x7fd069e9c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c2000) = 0x7fd069e9c000
mmap(0x7fd069ea2000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fd069ea2000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd06a0b8000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd06a0b6000
arch_prctl(ARCH_SET_FS, 0x7fd06a0b6740) = 0
mprotect(0x7fd069e9c000, 16384, PROT_READ) = 0
mprotect(0x602000, 4096, PROT_READ)     = 0
mprotect(0x7fd06a0c8000, 4096, PROT_READ) = 0
munmap(0x7fd06a0b9000, 56806)           = 0
geteuid()                               = 2138
getegid()                               = 1058
readlink("/proc/495647/root", "/", 950) = 1
brk(NULL)                               = 0x1e21000
brk(0x1e42000)                          = 0x1e42000
brk(NULL)                               = 0x1e42000
open("/proc/495647/status", O_RDONLY)   = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd06a0c6000
read(3, "Name:\tjava\nUmask:\t0022\nState:\tS "..., 1024) = 1024
read(3, "1\nvoluntary_ctxt_switches:\t3\nnon"..., 1024) = 59
read(3, "", 1024)                       = 0
close(3)                                = 0
munmap(0x7fd06a0c6000, 4096)            = 0
stat("/proc/self/ns/mnt", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
stat("/proc/495647/ns/mnt", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
open("/proc/495647/ns/mnt", O_RDONLY)   = 3
setns(3, 0)                             = -1 EPERM (Operation not permitted)
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd06a0c6000
write(1, "WARNING: couldn't enter target p"..., 53WARNING: couldn't enter target process mnt namespace
) = 53
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART, 0x7fd069b102f0}, {SIG_DFL, [], 0}, 8) = 0
stat("/tmp/.java_pid495647", 0x7ffffba0eec0) = -1 ENOENT (No such file or directory)
creat("/proc/495647/cwd/.attach_pid495647", 0660) = 3
close(3)                                = 0
stat("/proc/495647/cwd/.attach_pid495647", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0
geteuid()                               = 2138
kill(495647, SIGQUIT)                   = 0
nanosleep({0, 20000000}, NULL)          = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 40000000}, NULL)          = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 60000000}, NULL)          = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 80000000}, NULL)          = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 100000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 120000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 140000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 160000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 180000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 200000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 220000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 240000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 260000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
nanosleep({0, 280000000}, NULL)         = 0
stat("/tmp/.java_pid495647", 0x7ffffba0ea00) = -1 ENOENT (No such file or directory)
unlink("/proc/495647/cwd/.attach_pid495647") = 0
dup(2)                                  = 3
fcntl(3, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd06a0c5000
write(3, "Could not start attach mechanism"..., 60Could not start attach mechanism: No such file or directory
) = 60
close(3)                                = 0
munmap(0x7fd06a0c5000, 4096)            = 0
exit_group(1)                           = ?
+++ exited with 1 +++

I belive this is the relevan part:


open("/proc/495647/ns/mnt", O_RDONLY)   = 3
setns(3, 0)                             = -1 EPERM (Operation not permitted)
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd06a0c6000
write(1, "WARNING: couldn't enter target p"..., 53WARNING: couldn't enter target process mnt namespace
) = 53

Some details about it:


async-profiler $ ls -la /proc/495647/ns/mnt
lrwxrwxrwx 1 csl-perf serviceaccount 0 Feb  8 20:00 /proc/495647/ns/mnt -> mnt:[4026533548]

I, csl-perf, still seems to be the owner there.

Would appreciate for any pointers on how to debug this further.

UDPATE: Simlimking ../tmp to /tmp surely helps. I guess my question is more if it's possible to locate .java_pid automatically, which works for us in dedicated pool.

I want to see what strace looks like when I run it within a dedicated host, where it's known to be able to attach. Will post my findings in the comment.s

该提问来源于开源项目:jvm-profiling-tools/async-profiler

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

10条回答

  • weixin_39671467 weixin_39671467 4月前

    /tmp directory of the target process can now be overridden with JATTACH_PATH environment variable. I've choosen env variable instead of command-line argument to avoid passing arguments between jattach and profiler.sh script.

    点赞 评论 复制链接分享
  • weixin_40005542 weixin_40005542 4月前

    Thanks ! This looks great - I'm going to give it a try next time we update our async-profiler binaries.

    点赞 评论 复制链接分享
  • weixin_40005542 weixin_40005542 4月前

    Here is strace for a dedicated host (it's able to connect successfully). Note: dedicated host seems to create a .java_pid under root's tmp:

    
    sandbox $ ls -la /tmp
    total 6008
    drwxrwxrwt  5 root       root              4096 Feb  8 21:45 .
    dr-xr-xr-x 19 root       root              4096 Jul 28  2018 ..
    srw-------  1 csl-perf   serviceaccount       0 Feb  8 21:44 .java_pid502447
    
    
    async-profiler $ strace build/jattach 502447 load build/libasyncProfiler.so true start,event=alloc,file=t.svg,svg
    execve("build/jattach", ["build/jattach", "502447", "load", "build/libasyncProfiler.so", "true", "start,event=alloc,file=t.svg,svg"], [/* 25 vars */]) = 0
    brk(NULL)                               = 0xbdb000
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0e7c93b000
    access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
    open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=58148, ...}) = 0
    mmap(NULL, 58148, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0e7c92c000
    close(3)                                = 0
    open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
    read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340$\2\0\0\0\0\0"..., 832) = 832
    fstat(3, {st_mode=S_IFREG|0755, st_size=2151672, ...}) = 0
    mmap(NULL, 3981792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0e7c34e000
    mprotect(0x7f0e7c510000, 2097152, PROT_NONE) = 0
    mmap(0x7f0e7c710000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c2000) = 0x7f0e7c710000
    mmap(0x7f0e7c716000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0e7c716000
    close(3)                                = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0e7c92b000
    mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0e7c929000
    arch_prctl(ARCH_SET_FS, 0x7f0e7c929740) = 0
    mprotect(0x7f0e7c710000, 16384, PROT_READ) = 0
    mprotect(0x602000, 4096, PROT_READ)     = 0
    mprotect(0x7f0e7c93c000, 4096, PROT_READ) = 0
    munmap(0x7f0e7c92c000, 58148)           = 0
    geteuid()                               = 2138
    getegid()                               = 1058
    readlink("/proc/502447/root", "/", 950) = 1
    brk(NULL)                               = 0xbdb000
    brk(0xbfc000)                           = 0xbfc000
    brk(NULL)                               = 0xbfc000
    open("/proc/502447/status", O_RDONLY)   = 3
    fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0e7c93a000
    read(3, "Name:\tjava\nUmask:\t0022\nState:\tS "..., 1024) = 1020
    read(3, "", 1024)                       = 0
    close(3)                                = 0
    munmap(0x7f0e7c93a000, 4096)            = 0
    stat("/proc/self/ns/mnt", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
    stat("/proc/502447/ns/mnt", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
    open("/proc/502447/ns/mnt", O_RDONLY)   = 3
    setns(3, 0)                             = -1 EPERM (Operation not permitted)
    close(3)                                = 0
    fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
    mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0e7c93a000
    write(1, "WARNING: couldn't enter target p"..., 53WARNING: couldn't enter target process mnt namespace
    ) = 53
    rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART, 0x7f0e7c384280}, {SIG_DFL, [], 0}, 8) = 0
    stat("/tmp/.java_pid502447", {st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0
    socket(AF_LOCAL, SOCK_STREAM, 0)        = 3
    connect(3, {sa_family=AF_LOCAL, sun_path="/tmp/.java_pid502447"}, 110) = 0
    write(1, "Connected to remote JVM\n", 24Connected to remote JVM
    ) = 24
    write(3, "1\0", 2)                      = 2
    write(3, "load\0", 5)                   = 5
    write(3, "build/libasyncProfiler.so\0", 26) = 26
    write(3, "true\0", 5)                   = 5
    write(3, "start,event=alloc,file=t.svg,svg"..., 33) = 33
    write(1, "Response code = ", 16Response code = )        = 16
    read(3, "-1\n", 8191)                   = 3
    write(1, "-1\n", 3-1
    )                     = 3
    read(3, "", 8192)                       = 0
    write(1, "\n", 1
    )                       = 1
    close(3)                                = 0
    exit_group(-1)                          = ?
    +++ exited with 255 +++
    

    Turns out it has the same warning yet able to make progress and locate pid file under root's tmp.

    点赞 评论 复制链接分享
  • weixin_39671467 weixin_39671467 4月前

    Hi!

    It seems like Java process runs in a different mount namespace. In order to find files in that namespace the other process (jattach) needs to join that namespace via setns call. However, this syscall requires CAP_SYS_ADMIN capability, which is usually possessed only by root.

    Can you run jattach under sudo ?

    What does cat /proc/<javapid>/mountinfo say?

    点赞 评论 复制链接分享
  • weixin_40005542 weixin_40005542 4月前

    Thanks for the quick response, Andrei!

    I don't even have to run under root, all I need to do is to make a simlink to my sandbox's /tmp folder:

    
    ln -s /var/lib/mesos/slaves/7f559923-c588-4928-b967-58f186201480-S807/frameworks/201205082337-0000000003-0000/executors/foobarbaz/runs/latest/tmp/.java_pid32579 /tmp/.java_pid32579
    

    After this, I can run async-profiler w/o any problem.

    Here is the mountinfo if that helps:

    
    sandbox $ cat /proc/32579/mountinfo
    358 346 8:1 / / rw,noatime master:1 - xfs /dev/sda1 rw,attr2,nobarrier,inode64,noquota
    441 358 0:6 / /dev rw,nosuid master:2 - devtmpfs devtmpfs rw,size=131870252k,nr_inodes=32967563,mode=755
    463 441 0:19 / /dev/shm rw master:3 - tmpfs tmpfs rw
    592 441 0:20 / /dev/pts rw,relatime master:4 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
    603 441 0:36 / /dev/hugepages rw,relatime master:24 - hugetlbfs hugetlbfs rw,pagesize=2M
    612 441 0:17 / /dev/mqueue rw,relatime master:25 - mqueue mqueue rw
    614 358 0:4 / /proc rw,relatime master:5 - proc proc rw
    615 614 0:35 / /proc/sys/fs/binfmt_misc rw,relatime master:22 - autofs systemd-1 rw,fd=24,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=22650
    616 358 0:18 / /sys rw,relatime master:6 - sysfs sysfs rw
    617 616 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime master:7 - securityfs securityfs rw
    618 616 0:22 / /sys/fs/cgroup ro,nosuid,nodev,noexec master:8 - tmpfs tmpfs ro,mode=755
    619 618 0:23 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime master:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
    620 618 0:25 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime master:10 - cgroup cgroup rw,devices
    623 618 0:26 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,net_cls,net_prio
    624 618 0:27 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime master:12 - cgroup cgroup rw,freezer
    625 618 0:28 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime master:13 - cgroup cgroup rw,cpu,cpuacct
    626 618 0:29 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime master:14 - cgroup cgroup rw,perf_event
    628 618 0:30 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,memory
    629 618 0:31 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,pids
    630 618 0:32 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,blkio
    633 618 0:33 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,cpuset
    636 618 0:34 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,hugetlb
    637 616 0:24 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime master:20 - pstore pstore rw
    639 616 0:8 / /sys/kernel/debug rw,relatime master:23 - debugfs debugfs rw
    641 639 0:11 / /sys/kernel/debug/tracing rw,relatime master:62 - tracefs tracefs rw
    642 358 0:21 / /run rw,nosuid,nodev master:21 - tmpfs tmpfs rw,mode=755
    643 642 0:21 /netns /run/netns rw,nosuid,nodev master:100 - tmpfs tmpfs rw,mode=755
    644 643 0:3 net:[4026533379] /run/netns/315042 rw master:134 - nsfs nsfs rw
    645 643 0:3 net:[4026533159] /run/netns/55250 rw master:149 - nsfs nsfs rw
    646 643 0:3 net:[4026533718] /run/netns/421199 rw master:133 - nsfs nsfs rw
    648 643 0:3 net:[4026533661] /run/netns/375139 rw master:137 - nsfs nsfs rw
    649 643 0:3 net:[4026533606] /run/netns/516393 rw master:146 - nsfs nsfs rw
    659 643 0:3 net:[4026534006] /run/netns/209030 rw master:150 - nsfs nsfs rw
    662 643 0:3 net:[4026533725] /run/netns/129932 rw master:135 - nsfs nsfs rw
    669 643 0:3 net:[4026533841] /run/netns/339292 rw master:143 - nsfs nsfs rw
    672 643 0:3 net:[4026533493] /run/netns/12711 rw master:102 - nsfs nsfs rw
    675 643 0:3 net:[4026533551] /run/netns/12715 rw master:141 - nsfs nsfs rw
    677 643 0:3 net:[4026534003] /run/netns/6660 rw master:152 - nsfs nsfs rw
    690 643 0:3 net:[4026533781] /run/netns/458489 rw master:136 - nsfs nsfs rw
    699 643 0:3 net:[4026533437] /run/netns/490519 rw master:101 - nsfs nsfs rw
    703 358 253:1 / /var/lib/mesos_containers rw,noatime master:26 - ext3 /dev/mapper/vg_mesos-lv_mesos_containers rw,nobarrier,errors=remount-ro,data=ordered
    705 358 253:2 / /var/lib/mesos rw,noatime master:27 - xfs /dev/mapper/vg_mesos-lv_mesos rw,attr2,nobarrier,inode64,prjquota
    707 358 253:0 / /var/tmp rw,noatime master:28 - ext3 /dev/mapper/vg_mesos-lv_var_tmp rw,nobarrier,errors=remount-ro,data=ordered
    718 358 0:37 / /var/lib/tss/keys rw,nosuid,nodev,relatime master:96 - fuse tssfs rw,user_id=994,group_id=992,allow_other
    725 643 0:3 net:[4026533778] /run/netns/29808 rw master:142 - nsfs nsfs rw
    766 358 253:2 /slaves/7f559923-c588-4928-b967-58f186201480-S807/frameworks/201205082337-0000000003-0000/executors/foobarbaz/runs/3afe8a0a-b7f4-48b6-86fc-bdcdcaab985e/tmp /tmp rw,noatime master:27 - xfs /dev/mapper/vg_mesos-lv_mesos rw,attr2,nobarrier,inode64,prjquota
    768 705 253:0 /aurora /var/lib/mesos/slaves/7f559923-c588-4928-b967-58f186201480-S807/frameworks/201205082337-0000000003-0000/executors/foobarbaz/runs/3afe8a0a-b7f4-48b6-86fc-bdcdcaab985e/var/tmp/aurora rw,noatime master:28 - ext3 /dev/mapper/vg_mesos-lv_var_tmp rw,nobarrier,errors=remount-ro,data=ordered
    769 707 253:2 /slaves/7f559923-c588-4928-b967-58f186201480-S807/frameworks/201205082337-0000000003-0000/executors/foobarbaz/runs/3afe8a0a-b7f4-48b6-86fc-bdcdcaab985e/var/tmp /var/tmp rw,noatime master:27 - xfs /dev/mapper/vg_mesos-lv_mesos rw,attr2,nobarrier,inode64,prjquota
    772 769 253:0 /aurora /var/tmp/aurora rw,noatime master:28 - ext3 /dev/mapper/vg_mesos-lv_var_tmp rw,nobarrier,errors=remount-ro,data=ordered
    792 642 0:38 / /run/user/2138 rw,nosuid,nodev,relatime master:144 - tmpfs tmpfs rw,size=26376296k,mode=700,uid=2138,gid=1058
    921 642 0:39 / /run/user/0 rw,nosuid,nodev,relatime master:153 - tmpfs tmpfs rw,size=26376296k,mode=700
    

    Looks like it should be possible to derive tmp folder location from there?

    
    ls -la /var/lib/mesos/slaves/7f559923-c588-4928-b967-58f186201480-S807/frameworks/201205082337-0000000003-0000/executors/foobarbaz/runs/3afe8a0a-b7f4-48b6-86fc-bdcdcaab985e/tmp
    
    drwxrwxrwt 3 root     root              4096 Feb  8 22:06 .
    drwxr-xr-x 7 root     root              4096 Feb  8 21:56 ..
    -rw-r--r-- 1 csl-perf serviceaccount     151 Feb  8 22:04 async-profiler.72105.32579
    -rw-r--r-- 1 csl-perf serviceaccount  716789 Feb  8 22:05 async-profiler.72780.32579
    -rw-r--r-- 1 csl-perf serviceaccount  693569 Feb  8 22:06 async-profiler.73529.32579
    drwxr-xr-x 2 csl-perf serviceaccount      18 Feb  8 21:56 hsperfdata_csl-perf
    srw------- 1 csl-perf serviceaccount       0 Feb  8 21:58 .java_pid32579
    -rw-r--r-- 1 csl-perf serviceaccount 1191588 Feb  8 22:09 perf-32579.map
    

    UPDATE: Looks like I can just wrap profiler.sh with a simple check against a java_pid that also makes a similink. That shouldn't be too much of a trouble. That said, I'm happy if you decide to close this ticket, unless you think that java_pid was something async-profiler should be able to locate by itself.

    点赞 评论 复制链接分享
  • weixin_39671467 weixin_39671467 4月前

    The the following line in /proc/pid/mountinfo suggests the real location of /tmp mount:

    
    766 358 253:2 /slaves/7f559923-c588-4928-b967-58f186201480-S807/frameworks/201205082337-0000000003-0000/executors/foobarbaz/runs/3afe8a0a-b7f4-48b6-86fc-bdcdcaab985e/tmp /tmp rw,noatime master:27 - xfs /dev/mapper/vg_mesos-lv_mesos rw,attr2,nobarrier,inode64,prjquota
    

    So is the directory /slaves/7f559923-c588-4928-b967-58f186201480-S807/frameworks/201205082337-0000000003-0000/executors/foobarbaz/runs/3afe8a0a-b7f4-48b6-86fc-bdcdcaab985e/tmp accessible from your shell (I mean by this exact path)?

    点赞 评论 复制链接分享
  • weixin_40005542 weixin_40005542 4月前

    Yea, I looked it up too. It's not, but if you prefix it with /var/lib/mesos it's accessible - not exactly sure why it works this way.

    点赞 评论 复制链接分享
  • weixin_40005542 weixin_40005542 4月前

    It seems to be mounting / to /var/lib/mesos (see a couple of lines above). Maybe that's from where that's prefix is coming from?

    I checked again, there is no /slaves/... folder (in the root) on my host.

    点赞 评论 复制链接分享
  • weixin_39671467 weixin_39671467 4月前

    I'm afraid things are even more complicated. It's not always enough to parse just /proc/pid/mountinfo, but it is also necessary to look into /proc/self/mountinfo and then try to match one with another. And I can imagine a situation when this won't work either.

    Probably we can fall back to your original suggestion to override /tmp path with some jattach argument (or better with environment variable)?

    点赞 评论 复制链接分享
  • weixin_40005542 weixin_40005542 4月前

    Oh yeah, I like it - it's very simple. This could be a very reasonable path forward. Either an env variable or just a plain process argument would totally work for us!

    点赞 评论 复制链接分享

相关推荐