17:02:18 <vbatts> #startmeeting 2016-04-16 discussion
17:02:18 <collabot`> Meeting started Wed Apr 27 17:02:18 2016 UTC.  The chair is vbatts. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:02:18 <collabot`> Useful Commands: #action #agreed #help #info #idea #link #topic.
17:02:18 <collabot`> The meeting name has been set to '2016_04_16_discussion'
17:02:25 <vbatts> #chair philips
17:02:25 <collabot`> Current chairs: philips vbatts
17:02:33 * vbatts did it right this time
17:02:36 <philips> boooom
17:03:08 <duglin> https://github.com/opencontainers/runtime-spec/pull/395
17:03:41 <vbatts> #topic split start/stop hooks
17:03:41 <duglin> ya know… if we just accept 384 we can ignore 395  :-)
17:03:46 <vbatts> #link https://github.com/opencontainers/runtime-spec/pull/395
17:04:22 <vbatts> #link https://github.com/opencontainers/runtime-spec/pull/384
17:05:38 <vbatts> crosbymichael: before we change the spec, lets get the semantics right and test it
17:06:51 <duglin> we do have the split PR out there -it just doesn’t do the special sleep/process logic
17:07:04 <duglin> (PR for runc)
17:08:42 <vbatts> julz_: be explicit on what happens when pid1 exits
17:11:08 <RobDolinMS_> FWIW, it may be useful for have a write-up of current/required operations and prototyped/optional/experimental operations
17:14:10 <RobDolinMS_> Thanks for that explanation Michael
17:15:30 * vbatts forgot to note
17:15:48 <vbatts> duglin: what are the concerns to solve, to get this merged into runc?
17:17:08 <vbatts> blocking in create?
17:17:39 <vbatts> does "create" do a bind mount?
17:17:52 <vbatts> mrunalp: what is the problem being solved?
17:17:54 <RobDolinMS> Mrunal: What is the problem we're trying to solve?
17:18:06 <vbatts> julz_: it removes the need for hooks
17:18:18 <vbatts> mrunalp: but at-least pre-start is still needed
17:18:31 <vbatts> #chair RobDolinMS
17:18:31 <collabot`> Current chairs: RobDolinMS philips vbatts
17:18:50 <vbatts> mrunalp: delete,checkpoint,restore, what happens?
17:19:20 <vbatts> julz_: you can acheive some of this with hooks, but ...
17:19:48 <RobDolinMS_> (vbatts would you please #chair RobDolinMS_ )
17:19:54 <vbatts> #chair RobDolinMS_
17:19:54 <collabot`> Current chairs: RobDolinMS RobDolinMS_ philips vbatts
17:20:35 <vbatts> julz_: allowing for create, start steps. not being bound to hooks
17:21:04 <vbatts> julz_: lets the orchestrator have much more control on handling actions between steps
17:22:45 <duglin> mrunalp: can you state your usecase again? the one for pre-start hooks
17:26:12 <RobDolinMS_> mrunal: So you don't need to pass ~10 arguments
17:26:40 <RobDolinMS_> Vish: It would help if you could send an email explaining what needs to be satisfied
17:26:58 <RobDolinMS_> Mrunal: I can send a pointer to a GitHub repo
17:27:08 <RobDolinMS_> #topic Post Stop Hook
17:27:49 <RobDolinMS_> Vish: want to be able to get some info from the container before it is destroyed
17:28:07 <RobDolinMS_> Mrunal: When the container process exits, everything goes away
17:28:24 <RobDolinMS_> Doug: Should we do a split b/w Stop and Delete ?
17:30:23 <duglin> I feel like there are a lot of moving parts here - really wish we were all in the same room :-(
17:30:30 <RobDolinMS_> So are we looking at: Create (postcreate) ... (prestart) Start (poststart) ... (prestop) Stop (poststop) ... (predelete) Delete
17:32:51 <wking> I'm back, should I start taking minutes?
17:33:07 <RobDolinMS_> @wking - That would be great.
17:33:07 <collabot`> RobDolinMS_: Error: "wking" is not a valid command.
17:33:14 <wking> mrunalp: you can pass ocitools a template, and have it modify something (e.g. a namespace path)
17:33:21 <RobDolinMS_> wking - that would be great.  My audio is a bit flaky
17:33:44 <wking> julz_: you can write tooling to make cross-platform, exec-like stuff simple
17:34:15 <wking> julz_: it's nice to keep a single container/process idea, instead of attempting to distinguish processes from sandboxes
17:34:27 <vbatts> #chair wking
17:34:27 <collabot`> Current chairs: RobDolinMS RobDolinMS_ philips vbatts wking
17:36:30 <wking> julz_: I think we went down a bad path with 'exec'.  It almost handled what we need, so we didn't get the infrastructure we need to easily do this sort of thing with containers
17:37:15 <wking> julz_: one nice feature of of Linux namespaces is that you can enter a subset without joining them all
17:37:30 <wking> julz_: we already have to support that for the current Docker UI
17:37:39 <wking> vishh: but how does that relate to this conversation?
17:38:15 <wking> vishh: should we have one workflow for all use-cases, or can we split the spec to have distinct phases to let the programmers represent whatever they want to achieve in a portable manner?
17:38:37 <wking> julz_: I agree with the goal, and just think it's unachievable
17:39:21 <wking> duglin: are you saying you want a 'create' that just sets up the namespaces and a 'delete' that tears them down.  But while that container is running, you can attach to the container, run what you want, and die without killing the parent?
17:39:49 <duglin> Ican’t hear anyone - gonna redial
17:39:58 <wking> julz_: I'm saying "everything is a container", and Docker containers today have to be able to join a subset of a current container's namespaces
17:40:14 <duglin> I’m back
17:40:23 <duglin> sorry - don’t know what happened
17:40:23 <wking> julz_: For me, there's no such thing as 'exec', there are just containers, and those containers can enter a subset of another container's namespaces
17:40:43 <wking> vishh: So julz_ is saying, "we don't need the separation, and users can just deal with it"
17:40:55 <wking> vishh: I feel like this discussion isn't going anywhere, maybe for a good reason
17:41:11 <wking> vishh: If we focus on use cases, right now the post-stop use-case isn't satisfied
17:41:31 <wking> mrunalp: post-stop is satisfied.  You can use a pre-start bind-mount or add a flag to runC to do the bind mount for us
17:41:45 <wking> vishh: but we don't want to have bind mounts.  It's possible now without needing hooks
17:41:50 <duglin> runc createC --name=test
17:41:51 <duglin> runc createC --attach=test --ns=all --process=bash
17:41:51 <duglin> runc deleteD --name=test
17:41:52 <duglin> ??
17:41:59 <duglin> s/deleteD/deleteC/
17:42:08 <wking> vishh: we don't want to require callers to understand what's going on under the hood
17:42:34 <wking> julz_: I think we need to make the spec and tooling good enough to handle subcontainers clearly without requiring under-the-hood knowledge
17:42:57 <wking> julz_: Like crosbymichael was saying, we want to stick with the things we know are working today
17:43:01 <duglin> julz_ ^^
17:43:07 * vbatts back
17:43:26 <wking> julz_: we should standardize what we have and build things to make that simpler
17:43:45 <wking> julz_: I'm not against a bind-mount option, but I am nervous about making it a first-class flow, because it's not what we're useful
17:44:19 <wking> #link https://github.com/opencontainers/runtime-spec/pull/391
17:44:27 <wking> ^ notes on supporting 'exec' with the current spec
17:44:44 <wking> vishh: Docker uses runC now, so I don't think it needs anything we don't already have
17:45:50 <duglin> julz_:  https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d
17:45:53 <wking> vishh: you can handle post-stop hooks before the mount namespace is gone, you can do that by calling post-stop hooks before the runtime waiting on the container process
17:46:09 <wking> ^ I'm not sure if I got that right, but there was some discussion of wait timing
17:47:23 <wking> mrunalp: crosbymichael and I are both happy with #395, as long as it only has to be a best-effort to run post-stop hooks while the mount namespace is still there
17:48:39 <wking> mrunalp: I think #395 needs bind mounts, and I'm ok with that
17:49:06 <duglin> crosbymichael: does this make any sense to you?   https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d
17:49:20 <wking> preserving namespaces after container-process death sounds like a big shift to me
17:49:37 <wking> vishh: I don't see why we're afraid of pinning namespaces if it's not exposed to the user
17:49:50 <wking> mrunalp: it makes cleanup awkward.  Who cleans up the pinned namespaces if the host process dies
17:50:12 <wking> julz_: for example, I don't want an IP address from a dead container persisting
17:50:29 <wking> vishh: the host process will cleanup the pins
17:50:37 <wking> mrunalp: but what if the host process was killed before that?
17:50:54 <wking> vishh: today there's a cleanup daemon.
17:51:29 <wking> vishh: this gets into the kill/stop distinction
17:51:42 <wking> mrunalp: but the current spec is simpler without the daemon requirement
17:51:51 <wking> vishh: this is the same problem as cgroup cleanup
17:52:21 <wking> mrunalp: that's fair, we're just doubling down on the cgroup cleanup issue
17:52:48 <wking> julz_: but this is extending our exposure (e.g. pinning IP addresses in the network namespace)
17:53:16 <wking> vishh: if a user cares a lot about garbage collection, they should run their own garbage collector
17:54:46 <wking> julz_: I don't think pinning is the right answer.  Just use sub-containers
17:54:59 <wking> julz_: we have to support that anyway, so why require pinning separately?
17:55:49 <wking> vishh: there's less technical overhead for config authors if we pin past post-start hooks
17:56:11 <philips> sorry have to run, byes
17:56:29 <wking> julz_: the spec is supposed to be a low-level, generic thing.  I agree we should make this easier, but we can do that in high level tools
17:56:52 <wking> julz_: the problem with building lots of features into the bottom is that it's harder to decouple later
17:57:01 <wking> julz_: we can always push things down if they're super useful
17:57:44 <wking> julz_: there will be at least one layer of tools between the user's keyboard and the runtime-spec config
17:58:03 <wking> vishh: but it makes the spec less useful if you require tooling
17:58:12 <wking> vishh: then what's the point of the spec?
17:58:42 <wking> julz_: I want us to spend energy on the tooling to make 'exec' an easy flow, and we've already decided not to put that in the spec
17:58:55 <wking> julz_: and the post-stop pinning seems like a more peripheral case than 'exec'
17:59:11 <wking> vishh: the problem with 'exec' is that there wasn't agreement about the concept
17:59:43 <wking> vishh: so I don't know if the 'exec' example applies here (we all agree on what post-stop pinning means)
18:00:03 <wking> julz_: my point is that we pulled out 'exec' because you could reimplement it in higher-level tooling
18:00:25 <wking> julz_: you can also build tooling that makes pinning easy without having to bake it into the spec
18:00:51 <wking> mrunalp: going back to cleanup, there's no way to get the kernel to automatically reap cgroups, but if we avoid pinning we can have the kernel clean up namespaces
18:00:59 <wking> mrunalp: we could make it optional ;)
18:02:09 <wking> duglin: is my earlier Gist (https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d) what julz_ is proposing?
18:02:09 <RobDolinMS_> Need to drop for 11am.  Have a great day everyone :)
18:02:38 <wking> julz_: that's the create/start split, and we've shifted over to talk more about the post-stop pinning vs. sub-containers
18:03:04 <wking> duglin: that's what I'm doing in the Gist (https://gist.github.com/duglin/cc799056126143d93bd9c2e0c8a88a5d#file-gistfile1-txt-L9 <-- sub-container)
18:03:52 <wking> vishh: we can't use 'docker (re?)start' because we need to collect container content before the mount namespace is destroyed
18:04:39 <wking> julz_: what's the problem with the sub-container approach and having the parent hold the mount-namespace open?
18:04:59 <wking> vishh: that's fine, and you could do that in a tooling API, but then the tooling API becomes a spec, effectively
18:05:14 <wking> julz_: yeah, and this is effectively a pod
18:05:25 <wking> vishh: I'm not talking about pods
18:06:23 <wking> julz_: to me, once you say the lifecycle of the process is different from the lifecycle of the sandbox, I think you're getting into the pod landscape.  I think we should just embrace it
18:07:12 <wking> julz_: I see what you mean, it's not much of a stretch from a single container, but I think it *is* past the single container stage
18:07:27 <wking> vishh: for sub-namespaces you need a dummy process in the parent, and it could die
18:08:04 <wking> julz_: you could have tooling that handled this post-stop pinning via bind mounts
18:08:21 <wking> vishh: this just pushes all the interesting stuff up into tooling
18:08:49 <wking> ^ I have no problem pushing interesting stuff up into tooling ;)
18:09:21 <wking> julz_: I think it's too late to experiment on the standard, instead of testing ideas in tooling and then pushing them down into the standard if they have legs
18:10:05 <wking> vishh: how do we get past the uncertainty stage?
18:10:20 <wking> vishh: it's too hard to have a conversation without a concrete process
18:10:41 <wking> mrunalp: how about an experimental branch of runC for testing these ideas, and then we can bless them after they're proven
18:10:44 <wking> vishh: what is "proven"
18:10:53 <wking> mrunalp: we've tested all the edge cases
18:11:14 <wking> vishh: it should be written down what an author needs to do to convince the community that their change is proven
18:11:43 <wking> julz_: in this case, we can build a tool and judge by popularity.  If lots of users are using the tools, then merge them down
18:12:11 <wking> vishh: the bind-mount approach means we can't use 'docker restart', because you need an explicit phase to handle the bind-mounts
18:12:32 <wking> julz_: given you've bind-mounted in the pre-start, you can rely on the post-stop hooks having the namespaces pinned
18:12:48 <wking> julz_: if you want pinning, you should bind-mount in a pre-start hook for now
18:12:59 <wking> duglin: is the user doing this, or is the runtime doing it for them
18:13:12 <wking> julz_: you should opt-in with a argument
18:13:30 <wking> mrunalp: we can give folks sample hooks, etc. as well
18:14:56 <mrunalp> wking, Thanks for the notes :)
18:15:19 <mrunalp> #endmeeting