initramfsがrootfsを新しいルートでオーバーマウントする必要があるのはなぜですか？

Question

initramfsに関するLinuxのドキュメントとswitch_rootの- ソースコードを読みました。

ドキュメントは言う：

別のルートデバイスを切り替えると、initrdはピボットルートを作成してから、ラムディスクをアンマウントします。しかし、initramfsはrootfsです。pivot_rootrootfsもアンマウントもできません。代わりに、rootfsからすべてを削除してスペースを解放します（find -xdev/-exec rm '{}' ';'）、rootfsを新しいルートでオーバーマウントします（cd/newmount; mount- -move。/; chroot。）、stdin/stdout/stderrを新しい/ dev/consoleにアタッチし、新しいinitを実行します。

そしてswitch_rootは本当にそうします：

if (chdir(newroot)) { warn(_("failed to change directory to %s"), newroot); return -1; } ... if (mount(newroot, "/", NULL, MS_MOVE, NULL) < 0) { close(cfd); warn(_("failed to mount moving %s to /"), newroot); return -1; } ... if (chroot(".")) { close(cfd); warn(_("failed to change root")); return -1; }

なぜマウントポイントを/に移動する必要があるのですか？
なぜnew_rootにchrootしないのですか？

Idan Yadgar · Accepted Answer

編集：@ timothy-baldwinのおかげで編集されました。

new_rootを/にマウントすると、マウントネームスペースのルートディレクトリが変更されます。/をオーバーマウントせずにchrootすると、システムはchroot環境になります（ルートディレクトリはマウント名前空間のルートディレクトリと一致しません）。

これはいくつかの問題を引き起こします、例えば：

1. chroot内ではユーザー名前空間の作成は許可されていません。

man 2 unshareによると、chroot環境では、unshareingユーザー名前空間はEPERMで失敗します。

EPERM (since Linux 3.9) CLONE_NEWUSER was specified in flags and the caller is in a chroot environment (i.e., the caller's root directory does not match the root directory of the mount namespace in which it resides).

$ unshare -U unshare: unshare failed: Operation not permitted

2.マウント名前空間を入力すると、ルートディレクトリが名前空間のルートディレクトリに設定されます

マウント名前空間を入力すると、プロセスのルートディレクトリがマウント名前空間のルートディレクトリに設定されるため、マウント名前空間に対してsetnsを実行すると、ルートディレクトリがrootfsディレクトリに設定されます。

$ nsenter -m/proc/self/ns/mnt /bin/sh $ ls -ld /new_root new_root

Chrootの外にあるnew_rootディレクトリを確認できます。

`/`をマウントしても、chrootのエスケープは実際には防止されません

Rootユーザーは、このディレクトリをumountして、そのマウントネームスペース（setns）を再入力して、rootfsを表示できます。

#define _GNU_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <sys/mount.h> #include <unistd.h> #include <fcntl.h> #include <sched.h> #include <stdio.h> int main() { int ns = open("/proc/self/ns/mnt", O_RDONLY); if (ns == -1) { perror("open"); goto out; } if (umount2("/", MNT_DETACH)) { perror("umount2"); goto out; } if (setns(ns, CLONE_NEWNS)) { perror("setns"); goto out; } char *a[] = { "/bin/sh", NULL }; char *e[] = { NULL }; execve(a[0], a, e); perror("execve"); out: return 1; }

$ gcc -o main main.c $ unshare -m ./main / # ls -d new_root new_root / # mount -t proc proc /proc / # cat /proc/mounts none / rootfs rw 0 0 proc /proc proc rw,relatime 0 0

~~Chrootのエスケープを防ぐには、new_rootを/にマウントする必要があります。~~

~~最小限のinitramfsを作成し、switch_rootバイナリをこのシェルスクリプトに置き換えて、シェルを取得します。~~

#!/bin/sh exec /bin/sh

~~また、initramfs内の/bin/shを静的にリンクされたbusyboxに変更しました。~~

~~次のコードをコンパイルして静的にリンクしました。~~

#include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> int main() { int fd = open(".", O_RDONLY | O_CLOEXEC); if (fd < 0) { perror("open"); goto out0; } if (chroot("tmp")) { perror("chroot"); goto out1; } if (fchdir(fd)) { perror("fchdir"); goto out1; } if (chdir("..")) { perror("chdir"); goto out1; } char *const argvp[] = { "sh", NULL }; char *const envp[] = { NULL }; execve("bin/sh", argvp, envp); perror("execve"); out1: close(fd); out0: return 1; }

~~実際のルートファイルシステムのルートディレクトリに/escapeとして配置します。~~

~~再起動し、switch_rootが行われる直前にシェルを取得しました。~~

ルートをオーバーマウントせずに

$ mount --move proc new_root/proc $ mount --move dev new_root/dev $ mount --move sys new_root/sys $ mount --move run new_root/run $ exec chroot new_root $ ./escape $ ls -d new_root new_root

~~私はchrootを脱出した。~~

ルートのオーバーマウントあり

$ mount --move proc new_root/proc $ mount --move dev new_root/dev $ mount --move sys new_root/sys $ mount --move run new_root/run $ cd new_root $ mount --move . / $ exec chroot . $ ./escape $ ls -d new_root ls: cannot access 'new_root': No such file or directory

~~私はchrootを逃れられません。~~

Timothy Baldwin · Answer

Rootfsをオーバーマウントしないと、ユーザーとマウントの名前空間が壊れます。

setnsシステムコールは、呼び出し元のルートディレクトリをマウントネームスペースのルートディレクトリに設定し、chrootを取り消します。
プロセスのルートディレクトリがそのマウントネームスペースのルートディレクトリでない場合、非特権プロセスによるユーザーネームスペースの作成は禁止されています。