17:01:45 <mrunalp> #startmeeting OCI 3/30 17:01:45 <collabot`> Meeting started Wed Mar 30 17:01:45 2016 UTC. The chair is mrunalp. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:45 <collabot`> Useful Commands: #action #agreed #help #info #idea #link #topic. 17:01:45 <collabot`> The meeting name has been set to 'oci_3_30' 17:01:57 <duglin> RobDolinMS: priorities!! 17:01:57 <philips> I am at Linux Collab summit so #conferencewifi 17:02:04 <mrunalp> no worries :) 17:03:01 <duglin> wking: do you have a url to the start/create split email? 17:03:18 <mrunalp> #topic https://github.com/opencontainers/specs/issues/357 17:03:23 <wking> duglin: I'll look it up 17:04:18 <RobDolinMS> On the question of renaming the spec, I don't have a strong preference as to the exact name, but good to rename "spec" to foo-spec 17:04:35 <mrunalp> #action Brandon to email LF to go ahead with the rename 17:04:55 <mrunalp> #topic Create/start split 17:05:02 <wking> #link https://groups.google.com/a/opencontainers.org/forum/#!msg/dev/qWHoKs8Fsrk/k55FQrBzBgAJ 17:07:35 <wking> duglin: what if the initial PID 1 dies, can you run another start? 17:07:46 <wking> mrunalp: no, you'd be starting another container with another container ID 17:08:15 <wking> julz_: one think that's nice about proposal 3 is that it the PID / container relationship is clear 17:08:48 <wking> duglin: does having a separate stop make sense? Or separate stop/delete? It sounds like the container process dies, and everything gets stopped and deleted 17:09:12 <wking> julz_: if you want to keep the namespaces around, you could (via bind mounts or whatever), but there wouldn't be an explicit stop/delete 17:10:17 <wking> the process namespace is going away when it's PID 1 dies, regardless of anything we do in OCI 17:10:45 <wking> mrunalp: if you start a new container reusing the mount namespace, there may be leftover files, etc., from the previous container 17:11:18 <wking> julz_: if you're just creating a container to preserve the namespaces, you don't have to ever start it, just leave the idling init and launch sub-containers 17:11:31 <wking> mrunalp: that's supported now via namespace paths 17:11:49 <wking> julz_: yes, and that's what we all want for exec implementations. 17:12:46 <wking> duglin: if we go back to a single stop/delete, do we need a hook? Or push all cleanup to after the namespaces are gone 17:12:59 <wking> julz_: we already run post-stop hooks after the container process is dead 17:13:23 <wking> crosbymichael: this seems super complex 17:13:26 <vbatts|work> here 17:13:51 <JakeWarner|Work> (is anyone able to participate in these meetings?) 17:14:02 <mrunalp> yep 17:14:13 <mrunalp> uberconference.com/ssaul 17:14:48 <wking> julz_: this is less complicated than proposal 2 (from the list), because we don't need any bind mounting of namespaces, etc 17:14:56 <vishh> Flexibility is running hooks is the main use case.. 17:15:04 <vishh> *in running 17:15:32 <wking> julz_: and if you want a combined start you can skip the named socket (https://groups.google.com/a/opencontainers.org/d/msg/dev/qWHoKs8Fsrk/ug4PNraWBwAJ) 17:15:44 <wking> crosbymichael: proposal 3 talks about a Unix socket... 17:16:06 <wking> julz_: that's so a later 'start' call can tell the init process (launched from 'create') what user-code it should execute 17:16:50 <wking> duglin: this is basically what we're doing now (with a pipe between the runtime and container process) 17:17:16 <wking> julz_: yeah, we already have this dance going on, and if you know at 'create' time what you want to run, in which case this devolves to the current behaviour 17:17:35 <wking> mrunalp: and after 'start' is called, we follow the current lifecycle? 17:17:37 <wking> julz_: yes 17:17:52 <wking> vishh (I think): what about preserving the current namespaces? 17:18:12 <wking> julz_: with this proposal we don't have to distinguish between the container" 17:18:21 <wking> * "the sandbox" and "the container process" 17:18:45 <wking> julz_: which means any namespace, including the process namespace, are consistently alive or dead 17:19:14 <wking> vishh: but in reality you setup the sandbox, and then invoke the user process, and after that process dies you want to do stuff with the sandbox 17:20:08 <wking> duglin: vishh, what does "sandbox" mean to you? Does it include process namespaces? 17:20:19 <wking> vishh: no, it doesn't include the process namespace 17:21:07 <wking> julz_: if you want to preserve other namespaces, create a container to hold the namespaces open, and then use sub-containers inside that persistent wrapping container 17:21:24 <wking> vishh: we want simple concepts that multiple users can bend to their wishes 17:22:11 <wking> julz_: I think proposal 3 is that simple concept, because the lifecycle of the sandbox (whatever you want it to be) are all tied to a single container process 17:22:51 <wking> julz_: it's more complicated if you start special-casing sandboxes. Sometimes it's fine without a process namespace in the sandbox, and sometimes you need the process namespace to be part of the sandbox 17:23:11 <wking> mrunalp: the problem I see with putting this into runC is that it only works in a particular way, but runC works fine right now 17:23:32 <wking> mrunalp: this would force runC to implement the lifecycle of namespaces in a particular way 17:23:59 <wking> julz_: right now runC doesn't support a distinction between "sandbox" and "container process", and I'm not sure we want it to 17:24:30 <wking> julz_: with proposal 3, runC looks almost identical to it's current state. The only cost is idling the init process 17:24:56 <wking> julz_: but that seems like a small cost, especially compared to bind-mounting namespaces and the associated cleanup handling 17:25:19 <wking> mrunalp: we want a separate "sandbox" concept that's not tied to a container ID, then you can start a container in that sandbox 17:25:34 <wking> mrunalp: that way you can separate the lifecycle of the container and the lifecycle of the sandbox 17:26:07 <wking> julz_: that's my problem, I *don't* think we want to separate the sandbox from the container process, because we can't agree on what "the sandbox" means (e.g. if it includes the process namespace) 17:26:39 <wking> julz_: But propsal 3 seems to cover all of our use cases, and proposal 2 does not 17:27:11 <wking> vishh: so the main issue is the complication in runC? 17:27:21 <wking> mrunalp: yes, but there are also some unclear corner cases 17:27:44 <wking> julz_: we know the edge cases, because this is the same as what we do now, just with an idling pause in the init process 17:28:22 <wking> mrunalp: vishh might disagree with you on that, because he cares about the bind mounts 17:28:45 <wking> vishh: the split between create and start is awesome, but I want a split between the death of the container process and the sandbox cleanup 17:29:07 <wking> julz_: you can do that with sub-containers or bind mounts; either will keep the namespaces alive 17:29:53 <wking> vishh: create would create all the namespaces, bind mount them, and keep PID 1 running. I haven't seen a case for preserving process namespaces, but I do want to preserve the mount namespaces 17:30:24 <wking> julz_: you could setup bind mounts after 'create' finishes, before running 'start' if you wanted 17:31:19 <wking> julz_: crosbymichael is already concerned about complexity, so we don't need to build in help for bind-mounting namespaces 17:31:34 <wking> julz_: but you could build that into the spec if you wanted to 17:32:15 <wking> duglin: vishh, so you want two separate steps for 'stop' and 'delete', preserving all namespaces except for the process namespace until 'delete' 17:32:34 <wking> julz_: and you can still do that with sub-containers and not kill the wrapping container. No need for bind mounts 17:32:44 <wking> julz_: killing the wrapping container's PID 1 is like 'delete' 17:33:18 <wking> vishh: you can do all of this in many ways. This seems too invasive/low-level for going into the spec 17:34:06 <wking> julz_: we've talked about punting 'exec' to a higher level, and it's not hard to build 'exec' on top of this 17:34:27 <wking> vishh: I'm not disagreeing. This is a step in the right direction. 17:34:48 <wking> vishh: we should also consider more separation between the sandbox and the process at a later stage 17:35:11 <wking> duglin: vishh, would you be ok with proposal 3 with a split stop/delete? 17:35:32 <wking> vishh: I'm ok. I'd rather not expose cgroups and namespaces, but if you have to leak them out, I'm ok with that for now 17:35:43 <wking> duglin: I'm also worried about requiring consumers to bind-mount thing 17:35:58 <wking> julz_ keeps pointing out that user's don't need to bind-mount anything, and should use sub-containers ;) 17:37:39 <duglin> we have something else besides Linux? :-) 17:37:54 <wking> julz_: with option 3 we can express all of these use-cases. You can always wright higher-level wrappers. With option 2, there are some workflows you can't express, and that's harder to work around 17:38:04 <wking> vishh: how to express this on other operating systems? 17:38:19 <vbatts|work> duglin: heh 17:38:55 <wking> julz_: it's hard to predict what's hard/easy on other OSes. I suspect creating everything except the user process seems easier with VMs 17:39:40 <wking> vishh: I just want to avoid making life overly difficult for other OSes 17:40:06 <wking> julz_: maybe make the create/start split optional, so OSes where that's a problem can expose only a unified create/start command 17:40:22 <wking> vishh: I'd rather keep the spec more consistent across OSes 17:40:23 <wking> julz_: agreed 17:40:48 <wking> vishh: I want an explicit create/delete for the sandbox and explicit start/stop for the container process. I don't care about the process namespace 17:41:13 <wking> vishh: you'd still have to bind-mount to separate stop from delete, but I'm not sure how this translates to other OSes 17:41:25 <JakeWarner|Work> What's the definition of sandbox that we're going with here? 17:41:45 <wking> julz_: so there's a create/start split and a stop/delete split. I think we both agree that create/start is a good split with all the namespaces. 17:41:55 <duglin> mainly all NSs - whether it includes PID varies 17:42:02 <duglin> JakeWarner|Work: ^^ 17:42:14 <wking> julz_: it may also be useful to split stop/delete and preserve namespaces after the container process dies, and you can do that with bind-mounts, but that seems orthogonal 17:42:26 <JakeWarner|Work> Got it. 17:42:29 <JakeWarner|Work> Thanks 17:42:49 <wking> philips: isn't the goal to have some pre-start and post-stop things for a pod? It could be an init-system, or bind-mounts. Why be specific for the spec? 17:43:30 <wking> julz_: I agree. These implementation differences don't matter, but the bind-mount approach has a user-visible difference (no process namespace or container PID) 17:44:16 <wking> vishh: the spec is currently 1:1 with runC, so we don't consider spec changes until we see if something is implementable in runC 17:44:25 <wking> vishh: the question is "do we want to implement this in runC" 17:44:59 <wking> philips: if I was implementing this myself, I would use an init system to spawn processes. There are options that don't require bind mounts or application processes running as PID 1 17:45:13 <wking> julz_: I suspect we should define this in a way that you can implement things like that 17:46:01 <wking> vishh: for hooks, you can have multiple ways of running the hooks, but it should be tied to the container lifecycle clearly enough to make the changes you need 17:46:42 <wking> mrunalp: there is an argument about breaking down runC into smaller steps, so you don't have to build everything in. But there are also arguments for building things in. 17:47:06 <philips> thanks for taking notes wking 17:47:11 <wking> np 17:47:24 <wking> duglin: but if you make it easy to have a runC create join an existing namespace, you can do that without exposing all the lower-level details 17:47:31 <wking> mrunalp: so what do we do next? 17:47:39 <wking> crosbymichael: I dunno 17:47:49 <wking> mrunalp: maybe a concrete proposal that covers everything? 17:48:00 <wking> vishh: we should PR a spec change and argue there 17:48:14 <duglin> +1 to a concrete spec change PR so we can see the exact proposal 17:48:24 <wking> vishh: this has stalled because there wasn't an implementation in runC, but now we have a runC implementation. 17:48:44 <wking> vishh: a concrete spec change would give us something more concrete to argue about 17:48:51 <JakeWarner|Work> +1 to spec as well 17:49:01 <anush> +1 to spec 17:49:25 <wking> #link https://github.com/opencontainers/specs/issues/299 17:49:25 <tianon> I'm always +1 to a concrete PR-form proposal 17:49:39 <wking> ^ existing create/start proposal (proposal 2 in my summary email) 17:49:56 <philips> sorry, need to run off folks! 17:50:24 <mrunalp> #action julz/duglin to create a new PR 17:50:29 <wking> duglin: I'll be off for the next two weeks, but I can work on a PR with julz_ 17:50:41 <wking> mrunalp: other topics? 17:50:41 <duglin> s/off/traveling/ not vacation :-) 17:50:47 <wking> duglin: ah well ;) 17:50:59 <mrunalp> #endmeeting