17:02:18 <vbatts> #startmeeting 2016-04-16 discussion 17:02:18 <collabot`> Meeting started Wed Apr 27 17:02:18 2016 UTC. The chair is vbatts. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:02:18 <collabot`> Useful Commands: #action #agreed #help #info #idea #link #topic. 17:02:18 <collabot`> The meeting name has been set to '2016_04_16_discussion' 17:02:25 <vbatts> #chair philips 17:02:25 <collabot`> Current chairs: philips vbatts 17:02:33 * vbatts did it right this time 17:02:36 <philips> boooom 17:03:08 <duglin> https://github.com/opencontainers/runtime-spec/pull/395 17:03:41 <vbatts> #topic split start/stop hooks 17:03:41 <duglin> ya know… if we just accept 384 we can ignore 395 :-) 17:03:46 <vbatts> #link https://github.com/opencontainers/runtime-spec/pull/395 17:04:22 <vbatts> #link https://github.com/opencontainers/runtime-spec/pull/384 17:05:38 <vbatts> crosbymichael: before we change the spec, lets get the semantics right and test it 17:06:51 <duglin> we do have the split PR out there -it just doesn’t do the special sleep/process logic 17:07:04 <duglin> (PR for runc) 17:08:42 <vbatts> julz_: be explicit on what happens when pid1 exits 17:11:08 <RobDolinMS_> FWIW, it may be useful for have a write-up of current/required operations and prototyped/optional/experimental operations 17:14:10 <RobDolinMS_> Thanks for that explanation Michael 17:15:30 * vbatts forgot to note 17:15:48 <vbatts> duglin: what are the concerns to solve, to get this merged into runc? 17:17:08 <vbatts> blocking in create? 17:17:39 <vbatts> does "create" do a bind mount? 17:17:52 <vbatts> mrunalp: what is the problem being solved? 17:17:54 <RobDolinMS> Mrunal: What is the problem we're trying to solve? 17:18:06 <vbatts> julz_: it removes the need for hooks 17:18:18 <vbatts> mrunalp: but at-least pre-start is still needed 17:18:31 <vbatts> #chair RobDolinMS 17:18:31 <collabot`> Current chairs: RobDolinMS philips vbatts 17:18:50 <vbatts> mrunalp: delete,checkpoint,restore, what happens? 17:19:20 <vbatts> julz_: you can acheive some of this with hooks, but ... 17:19:48 <RobDolinMS_> (vbatts would you please #chair RobDolinMS_ ) 17:19:54 <vbatts> #chair RobDolinMS_ 17:19:54 <collabot`> Current chairs: RobDolinMS RobDolinMS_ philips vbatts 17:20:35 <vbatts> julz_: allowing for create, start steps. not being bound to hooks 17:21:04 <vbatts> julz_: lets the orchestrator have much more control on handling actions between steps 17:22:45 <duglin> mrunalp: can you state your usecase again? the one for pre-start hooks 17:26:12 <RobDolinMS_> mrunal: So you don't need to pass ~10 arguments 17:26:40 <RobDolinMS_> Vish: It would help if you could send an email explaining what needs to be satisfied 17:26:58 <RobDolinMS_> Mrunal: I can send a pointer to a GitHub repo 17:27:08 <RobDolinMS_> #topic Post Stop Hook 17:27:49 <RobDolinMS_> Vish: want to be able to get some info from the container before it is destroyed 17:28:07 <RobDolinMS_> Mrunal: When the container process exits, everything goes away 17:28:24 <RobDolinMS_> Doug: Should we do a split b/w Stop and Delete ? 17:30:23 <duglin> I feel like there are a lot of moving parts here - really wish we were all in the same room :-( 17:30:30 <RobDolinMS_> So are we looking at: Create (postcreate) ... (prestart) Start (poststart) ... (prestop) Stop (poststop) ... (predelete) Delete 17:32:51 <wking> I'm back, should I start taking minutes? 17:33:07 <RobDolinMS_> @wking - That would be great. 17:33:07 <collabot`> RobDolinMS_: Error: "wking" is not a valid command. 17:33:14 <wking> mrunalp: you can pass ocitools a template, and have it modify something (e.g. a namespace path) 17:33:21 <RobDolinMS_> wking - that would be great. My audio is a bit flaky 17:33:44 <wking> julz_: you can write tooling to make cross-platform, exec-like stuff simple 17:34:15 <wking> julz_: it's nice to keep a single container/process idea, instead of attempting to distinguish processes from sandboxes 17:34:27 <vbatts> #chair wking 17:34:27 <collabot`> Current chairs: RobDolinMS RobDolinMS_ philips vbatts wking 17:36:30 <wking> julz_: I think we went down a bad path with 'exec'. It almost handled what we need, so we didn't get the infrastructure we need to easily do this sort of thing with containers 17:37:15 <wking> julz_: one nice feature of of Linux namespaces is that you can enter a subset without joining them all 17:37:30 <wking> julz_: we already have to support that for the current Docker UI 17:37:39 <wking> vishh: but how does that relate to this conversation? 17:38:15 <wking> vishh: should we have one workflow for all use-cases, or can we split the spec to have distinct phases to let the programmers represent whatever they want to achieve in a portable manner? 17:38:37 <wking> julz_: I agree with the goal, and just think it's unachievable 17:39:21 <wking> duglin: are you saying you want a 'create' that just sets up the namespaces and a 'delete' that tears them down. But while that container is running, you can attach to the container, run what you want, and die without killing the parent? 17:39:49 <duglin> Ican’t hear anyone - gonna redial 17:39:58 <wking> julz_: I'm saying "everything is a container", and Docker containers today have to be able to join a subset of a current container's namespaces 17:40:14 <duglin> I’m back 17:40:23 <duglin> sorry - don’t know what happened 17:40:23 <wking> julz_: For me, there's no such thing as 'exec', there are just containers, and those containers can enter a subset of another container's namespaces 17:40:43 <wking> vishh: So julz_ is saying, "we don't need the separation, and users can just deal with it" 17:40:55 <wking> vishh: I feel like this discussion isn't going anywhere, maybe for a good reason 17:41:11 <wking> vishh: If we focus on use cases, right now the post-stop use-case isn't satisfied 17:41:31 <wking> mrunalp: post-stop is satisfied. You can use a pre-start bind-mount or add a flag to runC to do the bind mount for us 17:41:45 <wking> vishh: but we don't want to have bind mounts. It's possible now without needing hooks 17:41:50 <duglin> runc createC --name=test 17:41:51 <duglin> runc createC --attach=test --ns=all --process=bash 17:41:51 <duglin> runc deleteD --name=test 17:41:52 <duglin> ?? 17:41:59 <duglin> s/deleteD/deleteC/ 17:42:08 <wking> vishh: we don't want to require callers to understand what's going on under the hood 17:42:34 <wking> julz_: I think we need to make the spec and tooling good enough to handle subcontainers clearly without requiring under-the-hood knowledge 17:42:57 <wking> julz_: Like crosbymichael was saying, we want to stick with the things we know are working today 17:43:01 <duglin> julz_ ^^ 17:43:07 * vbatts back 17:43:26 <wking> julz_: we should standardize what we have and build things to make that simpler 17:43:45 <wking> julz_: I'm not against a bind-mount option, but I am nervous about making it a first-class flow, because it's not what we're useful 17:44:19 <wking> #link https://github.com/opencontainers/runtime-spec/pull/391 17:44:27 <wking> ^ notes on supporting 'exec' with the current spec 17:44:44 <wking> vishh: Docker uses runC now, so I don't think it needs anything we don't already have 17:45:50 <duglin> julz_: https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d 17:45:53 <wking> vishh: you can handle post-stop hooks before the mount namespace is gone, you can do that by calling post-stop hooks before the runtime waiting on the container process 17:46:09 <wking> ^ I'm not sure if I got that right, but there was some discussion of wait timing 17:47:23 <wking> mrunalp: crosbymichael and I are both happy with #395, as long as it only has to be a best-effort to run post-stop hooks while the mount namespace is still there 17:48:39 <wking> mrunalp: I think #395 needs bind mounts, and I'm ok with that 17:49:06 <duglin> crosbymichael: does this make any sense to you? https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d 17:49:20 <wking> preserving namespaces after container-process death sounds like a big shift to me 17:49:37 <wking> vishh: I don't see why we're afraid of pinning namespaces if it's not exposed to the user 17:49:50 <wking> mrunalp: it makes cleanup awkward. Who cleans up the pinned namespaces if the host process dies 17:50:12 <wking> julz_: for example, I don't want an IP address from a dead container persisting 17:50:29 <wking> vishh: the host process will cleanup the pins 17:50:37 <wking> mrunalp: but what if the host process was killed before that? 17:50:54 <wking> vishh: today there's a cleanup daemon. 17:51:29 <wking> vishh: this gets into the kill/stop distinction 17:51:42 <wking> mrunalp: but the current spec is simpler without the daemon requirement 17:51:51 <wking> vishh: this is the same problem as cgroup cleanup 17:52:21 <wking> mrunalp: that's fair, we're just doubling down on the cgroup cleanup issue 17:52:48 <wking> julz_: but this is extending our exposure (e.g. pinning IP addresses in the network namespace) 17:53:16 <wking> vishh: if a user cares a lot about garbage collection, they should run their own garbage collector 17:54:46 <wking> julz_: I don't think pinning is the right answer. Just use sub-containers 17:54:59 <wking> julz_: we have to support that anyway, so why require pinning separately? 17:55:49 <wking> vishh: there's less technical overhead for config authors if we pin past post-start hooks 17:56:11 <philips> sorry have to run, byes 17:56:29 <wking> julz_: the spec is supposed to be a low-level, generic thing. I agree we should make this easier, but we can do that in high level tools 17:56:52 <wking> julz_: the problem with building lots of features into the bottom is that it's harder to decouple later 17:57:01 <wking> julz_: we can always push things down if they're super useful 17:57:44 <wking> julz_: there will be at least one layer of tools between the user's keyboard and the runtime-spec config 17:58:03 <wking> vishh: but it makes the spec less useful if you require tooling 17:58:12 <wking> vishh: then what's the point of the spec? 17:58:42 <wking> julz_: I want us to spend energy on the tooling to make 'exec' an easy flow, and we've already decided not to put that in the spec 17:58:55 <wking> julz_: and the post-stop pinning seems like a more peripheral case than 'exec' 17:59:11 <wking> vishh: the problem with 'exec' is that there wasn't agreement about the concept 17:59:43 <wking> vishh: so I don't know if the 'exec' example applies here (we all agree on what post-stop pinning means) 18:00:03 <wking> julz_: my point is that we pulled out 'exec' because you could reimplement it in higher-level tooling 18:00:25 <wking> julz_: you can also build tooling that makes pinning easy without having to bake it into the spec 18:00:51 <wking> mrunalp: going back to cleanup, there's no way to get the kernel to automatically reap cgroups, but if we avoid pinning we can have the kernel clean up namespaces 18:00:59 <wking> mrunalp: we could make it optional ;) 18:02:09 <wking> duglin: is my earlier Gist (https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d) what julz_ is proposing? 18:02:09 <RobDolinMS_> Need to drop for 11am. Have a great day everyone :) 18:02:38 <wking> julz_: that's the create/start split, and we've shifted over to talk more about the post-stop pinning vs. sub-containers 18:03:04 <wking> duglin: that's what I'm doing in the Gist (https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d#file-gistfile1-txt-L9 <-- sub-container) 18:03:52 <wking> vishh: we can't use 'docker (re?)start' because we need to collect container content before the mount namespace is destroyed 18:04:39 <wking> julz_: what's the problem with the sub-container approach and having the parent hold the mount-namespace open? 18:04:59 <wking> vishh: that's fine, and you could do that in a tooling API, but then the tooling API becomes a spec, effectively 18:05:14 <wking> julz_: yeah, and this is effectively a pod 18:05:25 <wking> vishh: I'm not talking about pods 18:06:23 <wking> julz_: to me, once you say the lifecycle of the process is different from the lifecycle of the sandbox, I think you're getting into the pod landscape. I think we should just embrace it 18:07:12 <wking> julz_: I see what you mean, it's not much of a stretch from a single container, but I think it *is* past the single container stage 18:07:27 <wking> vishh: for sub-namespaces you need a dummy process in the parent, and it could die 18:08:04 <wking> julz_: you could have tooling that handled this post-stop pinning via bind mounts 18:08:21 <wking> vishh: this just pushes all the interesting stuff up into tooling 18:08:49 <wking> ^ I have no problem pushing interesting stuff up into tooling ;) 18:09:21 <wking> julz_: I think it's too late to experiment on the standard, instead of testing ideas in tooling and then pushing them down into the standard if they have legs 18:10:05 <wking> vishh: how do we get past the uncertainty stage? 18:10:20 <wking> vishh: it's too hard to have a conversation without a concrete process 18:10:41 <wking> mrunalp: how about an experimental branch of runC for testing these ideas, and then we can bless them after they're proven 18:10:44 <wking> vishh: what is "proven" 18:10:53 <wking> mrunalp: we've tested all the edge cases 18:11:14 <wking> vishh: it should be written down what an author needs to do to convince the community that their change is proven 18:11:43 <wking> julz_: in this case, we can build a tool and judge by popularity. If lots of users are using the tools, then merge them down 18:12:11 <wking> vishh: the bind-mount approach means we can't use 'docker restart', because you need an explicit phase to handle the bind-mounts 18:12:32 <wking> julz_: given you've bind-mounted in the pre-start, you can rely on the post-stop hooks having the namespaces pinned 18:12:48 <wking> julz_: if you want pinning, you should bind-mount in a pre-start hook for now 18:12:59 <wking> duglin: is the user doing this, or is the runtime doing it for them 18:13:12 <wking> julz_: you should opt-in with a argument 18:13:30 <wking> mrunalp: we can give folks sample hooks, etc. as well 18:14:56 <mrunalp> wking, Thanks for the notes :) 18:15:19 <mrunalp> #endmeeting