17:02:18 #startmeeting 2016-04-16 discussion 17:02:18 Meeting started Wed Apr 27 17:02:18 2016 UTC. The chair is vbatts. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:02:18 Useful Commands: #action #agreed #help #info #idea #link #topic. 17:02:18 The meeting name has been set to '2016_04_16_discussion' 17:02:25 #chair philips 17:02:25 Current chairs: philips vbatts 17:02:33 * vbatts did it right this time 17:02:36 boooom 17:03:08 https://github.com/opencontainers/runtime-spec/pull/395 17:03:41 #topic split start/stop hooks 17:03:41 ya know… if we just accept 384 we can ignore 395 :-) 17:03:46 #link https://github.com/opencontainers/runtime-spec/pull/395 17:04:22 #link https://github.com/opencontainers/runtime-spec/pull/384 17:05:38 crosbymichael: before we change the spec, lets get the semantics right and test it 17:06:51 we do have the split PR out there -it just doesn’t do the special sleep/process logic 17:07:04 (PR for runc) 17:08:42 julz_: be explicit on what happens when pid1 exits 17:11:08 FWIW, it may be useful for have a write-up of current/required operations and prototyped/optional/experimental operations 17:14:10 Thanks for that explanation Michael 17:15:30 * vbatts forgot to note 17:15:48 duglin: what are the concerns to solve, to get this merged into runc? 17:17:08 blocking in create? 17:17:39 does "create" do a bind mount? 17:17:52 mrunalp: what is the problem being solved? 17:17:54 Mrunal: What is the problem we're trying to solve? 17:18:06 julz_: it removes the need for hooks 17:18:18 mrunalp: but at-least pre-start is still needed 17:18:31 #chair RobDolinMS 17:18:31 Current chairs: RobDolinMS philips vbatts 17:18:50 mrunalp: delete,checkpoint,restore, what happens? 17:19:20 julz_: you can acheive some of this with hooks, but ... 17:19:48 (vbatts would you please #chair RobDolinMS_ ) 17:19:54 #chair RobDolinMS_ 17:19:54 Current chairs: RobDolinMS RobDolinMS_ philips vbatts 17:20:35 julz_: allowing for create, start steps. not being bound to hooks 17:21:04 julz_: lets the orchestrator have much more control on handling actions between steps 17:22:45 mrunalp: can you state your usecase again? the one for pre-start hooks 17:26:12 mrunal: So you don't need to pass ~10 arguments 17:26:40 Vish: It would help if you could send an email explaining what needs to be satisfied 17:26:58 Mrunal: I can send a pointer to a GitHub repo 17:27:08 #topic Post Stop Hook 17:27:49 Vish: want to be able to get some info from the container before it is destroyed 17:28:07 Mrunal: When the container process exits, everything goes away 17:28:24 Doug: Should we do a split b/w Stop and Delete ? 17:30:23 I feel like there are a lot of moving parts here - really wish we were all in the same room :-( 17:30:30 So are we looking at: Create (postcreate) ... (prestart) Start (poststart) ... (prestop) Stop (poststop) ... (predelete) Delete 17:32:51 I'm back, should I start taking minutes? 17:33:07 @wking - That would be great. 17:33:07 RobDolinMS_: Error: "wking" is not a valid command. 17:33:14 mrunalp: you can pass ocitools a template, and have it modify something (e.g. a namespace path) 17:33:21 wking - that would be great. My audio is a bit flaky 17:33:44 julz_: you can write tooling to make cross-platform, exec-like stuff simple 17:34:15 julz_: it's nice to keep a single container/process idea, instead of attempting to distinguish processes from sandboxes 17:34:27 #chair wking 17:34:27 Current chairs: RobDolinMS RobDolinMS_ philips vbatts wking 17:36:30 julz_: I think we went down a bad path with 'exec'. It almost handled what we need, so we didn't get the infrastructure we need to easily do this sort of thing with containers 17:37:15 julz_: one nice feature of of Linux namespaces is that you can enter a subset without joining them all 17:37:30 julz_: we already have to support that for the current Docker UI 17:37:39 vishh: but how does that relate to this conversation? 17:38:15 vishh: should we have one workflow for all use-cases, or can we split the spec to have distinct phases to let the programmers represent whatever they want to achieve in a portable manner? 17:38:37 julz_: I agree with the goal, and just think it's unachievable 17:39:21 duglin: are you saying you want a 'create' that just sets up the namespaces and a 'delete' that tears them down. But while that container is running, you can attach to the container, run what you want, and die without killing the parent? 17:39:49 Ican’t hear anyone - gonna redial 17:39:58 julz_: I'm saying "everything is a container", and Docker containers today have to be able to join a subset of a current container's namespaces 17:40:14 I’m back 17:40:23 sorry - don’t know what happened 17:40:23 julz_: For me, there's no such thing as 'exec', there are just containers, and those containers can enter a subset of another container's namespaces 17:40:43 vishh: So julz_ is saying, "we don't need the separation, and users can just deal with it" 17:40:55 vishh: I feel like this discussion isn't going anywhere, maybe for a good reason 17:41:11 vishh: If we focus on use cases, right now the post-stop use-case isn't satisfied 17:41:31 mrunalp: post-stop is satisfied. You can use a pre-start bind-mount or add a flag to runC to do the bind mount for us 17:41:45 vishh: but we don't want to have bind mounts. It's possible now without needing hooks 17:41:50 runc createC --name=test 17:41:51 runc createC --attach=test --ns=all --process=bash 17:41:51 runc deleteD --name=test 17:41:52 ?? 17:41:59 s/deleteD/deleteC/ 17:42:08 vishh: we don't want to require callers to understand what's going on under the hood 17:42:34 julz_: I think we need to make the spec and tooling good enough to handle subcontainers clearly without requiring under-the-hood knowledge 17:42:57 julz_: Like crosbymichael was saying, we want to stick with the things we know are working today 17:43:01 julz_ ^^ 17:43:07 * vbatts back 17:43:26 julz_: we should standardize what we have and build things to make that simpler 17:43:45 julz_: I'm not against a bind-mount option, but I am nervous about making it a first-class flow, because it's not what we're useful 17:44:19 #link https://github.com/opencontainers/runtime-spec/pull/391 17:44:27 ^ notes on supporting 'exec' with the current spec 17:44:44 vishh: Docker uses runC now, so I don't think it needs anything we don't already have 17:45:50 julz_: https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d 17:45:53 vishh: you can handle post-stop hooks before the mount namespace is gone, you can do that by calling post-stop hooks before the runtime waiting on the container process 17:46:09 ^ I'm not sure if I got that right, but there was some discussion of wait timing 17:47:23 mrunalp: crosbymichael and I are both happy with #395, as long as it only has to be a best-effort to run post-stop hooks while the mount namespace is still there 17:48:39 mrunalp: I think #395 needs bind mounts, and I'm ok with that 17:49:06 crosbymichael: does this make any sense to you? https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d 17:49:20 preserving namespaces after container-process death sounds like a big shift to me 17:49:37 vishh: I don't see why we're afraid of pinning namespaces if it's not exposed to the user 17:49:50 mrunalp: it makes cleanup awkward. Who cleans up the pinned namespaces if the host process dies 17:50:12 julz_: for example, I don't want an IP address from a dead container persisting 17:50:29 vishh: the host process will cleanup the pins 17:50:37 mrunalp: but what if the host process was killed before that? 17:50:54 vishh: today there's a cleanup daemon. 17:51:29 vishh: this gets into the kill/stop distinction 17:51:42 mrunalp: but the current spec is simpler without the daemon requirement 17:51:51 vishh: this is the same problem as cgroup cleanup 17:52:21 mrunalp: that's fair, we're just doubling down on the cgroup cleanup issue 17:52:48 julz_: but this is extending our exposure (e.g. pinning IP addresses in the network namespace) 17:53:16 vishh: if a user cares a lot about garbage collection, they should run their own garbage collector 17:54:46 julz_: I don't think pinning is the right answer. Just use sub-containers 17:54:59 julz_: we have to support that anyway, so why require pinning separately? 17:55:49 vishh: there's less technical overhead for config authors if we pin past post-start hooks 17:56:11 sorry have to run, byes 17:56:29 julz_: the spec is supposed to be a low-level, generic thing. I agree we should make this easier, but we can do that in high level tools 17:56:52 julz_: the problem with building lots of features into the bottom is that it's harder to decouple later 17:57:01 julz_: we can always push things down if they're super useful 17:57:44 julz_: there will be at least one layer of tools between the user's keyboard and the runtime-spec config 17:58:03 vishh: but it makes the spec less useful if you require tooling 17:58:12 vishh: then what's the point of the spec? 17:58:42 julz_: I want us to spend energy on the tooling to make 'exec' an easy flow, and we've already decided not to put that in the spec 17:58:55 julz_: and the post-stop pinning seems like a more peripheral case than 'exec' 17:59:11 vishh: the problem with 'exec' is that there wasn't agreement about the concept 17:59:43 vishh: so I don't know if the 'exec' example applies here (we all agree on what post-stop pinning means) 18:00:03 julz_: my point is that we pulled out 'exec' because you could reimplement it in higher-level tooling 18:00:25 julz_: you can also build tooling that makes pinning easy without having to bake it into the spec 18:00:51 mrunalp: going back to cleanup, there's no way to get the kernel to automatically reap cgroups, but if we avoid pinning we can have the kernel clean up namespaces 18:00:59 mrunalp: we could make it optional ;) 18:02:09 duglin: is my earlier Gist (https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d) what julz_ is proposing? 18:02:09 Need to drop for 11am. Have a great day everyone :) 18:02:38 julz_: that's the create/start split, and we've shifted over to talk more about the post-stop pinning vs. sub-containers 18:03:04 duglin: that's what I'm doing in the Gist (https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d#file-gistfile1-txt-L9 <-- sub-container) 18:03:52 vishh: we can't use 'docker (re?)start' because we need to collect container content before the mount namespace is destroyed 18:04:39 julz_: what's the problem with the sub-container approach and having the parent hold the mount-namespace open? 18:04:59 vishh: that's fine, and you could do that in a tooling API, but then the tooling API becomes a spec, effectively 18:05:14 julz_: yeah, and this is effectively a pod 18:05:25 vishh: I'm not talking about pods 18:06:23 julz_: to me, once you say the lifecycle of the process is different from the lifecycle of the sandbox, I think you're getting into the pod landscape. I think we should just embrace it 18:07:12 julz_: I see what you mean, it's not much of a stretch from a single container, but I think it *is* past the single container stage 18:07:27 vishh: for sub-namespaces you need a dummy process in the parent, and it could die 18:08:04 julz_: you could have tooling that handled this post-stop pinning via bind mounts 18:08:21 vishh: this just pushes all the interesting stuff up into tooling 18:08:49 ^ I have no problem pushing interesting stuff up into tooling ;) 18:09:21 julz_: I think it's too late to experiment on the standard, instead of testing ideas in tooling and then pushing them down into the standard if they have legs 18:10:05 vishh: how do we get past the uncertainty stage? 18:10:20 vishh: it's too hard to have a conversation without a concrete process 18:10:41 mrunalp: how about an experimental branch of runC for testing these ideas, and then we can bless them after they're proven 18:10:44 vishh: what is "proven" 18:10:53 mrunalp: we've tested all the edge cases 18:11:14 vishh: it should be written down what an author needs to do to convince the community that their change is proven 18:11:43 julz_: in this case, we can build a tool and judge by popularity. If lots of users are using the tools, then merge them down 18:12:11 vishh: the bind-mount approach means we can't use 'docker restart', because you need an explicit phase to handle the bind-mounts 18:12:32 julz_: given you've bind-mounted in the pre-start, you can rely on the post-stop hooks having the namespaces pinned 18:12:48 julz_: if you want pinning, you should bind-mount in a pre-start hook for now 18:12:59 duglin: is the user doing this, or is the runtime doing it for them 18:13:12 julz_: you should opt-in with a argument 18:13:30 mrunalp: we can give folks sample hooks, etc. as well 18:14:56 wking, Thanks for the notes :) 18:15:19 #endmeeting