08:02:08 <joehuang> #startmeeting multisite 08:02:08 <collabot> Meeting started Thu Sep 3 08:02:08 2015 UTC. The chair is joehuang. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:02:08 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic. 08:02:08 <collabot> The meeting name has been set to 'multisite' 08:02:22 <sorantis> joehuang, could you update your commit for keystone? 08:02:23 <joehuang> #topic rollcall 08:02:33 <sorantis> #info Dimitri 08:02:36 <joehuang> ok 08:02:38 <colintd> #info colintd 08:02:43 <joehuang> #info joehuang 08:02:43 <Malla> #info Malla 08:02:54 <Tapio_T> #info Tapio Tallgren 08:03:12 <sorantis> I tried to review it, but scrolling was unmanageable 08:04:09 <joehuang> ok, i will update the commit with break for the sentence longer than 80 characthers 08:04:35 <sorantis> thanks! 08:04:43 <joehuang> for the .rst, it looks good 08:05:04 <joehuang> but for gerrit, it will not autowrap 08:05:09 <sorantis> i meant the rst 08:05:33 <joehuang> sure, I will update it 08:05:51 <joehuang> #action update the keystone commit, joehuang 08:06:11 <joehuang> #topic VNF Geo site redundancy 08:06:44 <joehuang> #link https://etherpad.opnfv.org/p/VNF_Geo_site_redundancy 08:07:18 <joehuang> #info three proposal were presented in the etherpad for VNF geo site redundancy 08:07:38 <sorantis> yes 08:07:51 <sorantis> I’ve got a question on nova quiescing 08:07:57 <joehuang> pls 08:08:20 <sorantis> in the solution proposal 1 you mentioned that “Need Nova to expose the quiesce / unquiesce, fortunately it's alreay there in Nova-compute, just to add API layer to expose the functionality. 08:08:21 <sorantis> ” 08:08:58 <joehuang> no Nova API for quiesce / unquiesce 08:09:35 <sorantis> what about this? https://blueprints.launchpad.net/nova/+spec/quiesced-image-snapshots-with-qemu-guest-agent 08:09:37 <joehuang> but the functionality has been used by Nova snapshot, and implemented in the nova-compute 08:09:51 <joehuang> it's this one 08:09:59 <joehuang> but no API exposed 08:10:12 <joehuang> just create VM snapshot 08:10:13 <sorantis> then the related interface is exposed in glance 08:10:15 <sorantis> os_require_quiesce=yes 08:10:50 <joehuang> the image metadata should be os_require_quiesce=yes 08:11:00 <sorantis> that’s right 08:11:34 <joehuang> I mean no Nova api to quiesce/unquiesce VM directly 08:12:22 <sorantis> ok, so the intention is to enable fs quiescing on a running VM 08:12:38 <joehuang> you can boot a VM with image attribute with os_require_quiesce=yes ( the image support the guest agent to quiesce ) 08:12:42 <joehuang> yes 08:13:29 <joehuang> the intention is to ask Nova to expose explict api to quiesce/unquiesce running VM 08:13:37 <sorantis> right the point of having this metadata attribute is to install the necessary agents 08:14:12 <joehuang> the image is built in with the necessary agent that make quiesce working 08:15:08 <sorantis> ok 08:15:19 <sorantis> I’ve stumbled upon this post http://www.sebastien-han.fr/blog/2015/02/09/openstack-perform-consistent-snapshots-with-qemu-guest-agent/ 08:15:37 <joehuang> #info the purpose to expose the quiesce/unquiesce API directly is to make transactional snapshot of a group of VMs is possible 08:15:38 <sorantis> could be used as an interim solution 08:16:09 <joehuang> that way you have to manually loggon to the VM and freeze the VM by yourself 08:16:10 <sorantis> #info …in Nova 08:16:25 <sorantis> not necessarily 08:16:33 <sorantis> once could use virsh 08:16:46 <sorantis> sudo virsh qemu-agent-command instance-00000008 '{"execute":"guest-fsfreeze-freeze"}' 08:17:21 <joehuang> sure , you can use command line on that phsycical server 08:17:34 <joehuang> sorry I can not opent the link you shared 08:17:58 <sorantis> # info Possible interim solution http://www.sebastien-han.fr/blog/2015/02/09/openstack-perform-consistent-snapshots-with-qemu-guest-agent/ 08:18:18 <sorantis> I can share the text in an email 08:19:00 <joehuang> that;s the current API implementation. I just open the page 08:19:54 <sorantis> that’s right. just thought to share this in case nova refuses to expose this function as an api ;) 08:20:37 <joehuang> I saw the Nova PTL comment in one BP raised by Cinder 08:21:26 <joehuang> can restore the topic if there is engineer willing to work on it 08:21:47 <sorantis> can you share the bp? 08:21:51 <joehuang> thanks. Sorantis, helpful 08:21:54 <joehuang> yes 08:22:18 <joehuang> wait moment 08:23:00 <colintd> I know I've missed some of chunk of discussion over the last week or two, but could we step back a moment and discuss whether snapshotting all the attached volumes to the current set of VMs for a VNF is what is needed? 08:24:44 <joehuang> sorry take time to search the bp 08:24:47 <joehuang> share later 08:24:58 <joehuang> to conlin, one second 08:25:59 <joehuang> if the site where the VNF failed, the VNF can restore in another site 08:26:30 <joehuang> especially catastrophic failures (flood, earthquake, propagating software fault), 08:27:14 <joehuang> so how to restore all regarding VNF in another site, but usually not running 08:27:15 <colintd> I agree the desire is to restore function, my concerns is that trying to do that simply by snapshotting all the volumes is quite a coarse approach, with lots of challenges both in terms of consistency, but also interplay with the VNF Manager and orchestrator. 08:27:54 <joehuang> we are talking about the consistency way, that's the proposal 1 and 2 involved 08:28:20 <colintd> Consider a highly elastic system, are we going to be replicating and deleting volumes each time a VM is added or removed? Also what about the need for tight coordiantion between the replication code and the VNFM so the right volume is used a restart? 08:29:31 <colintd> My expectation was that we were going to focus on helping applications with replication of a _select_ set of data between sites which they would use when restarted 08:29:33 <joehuang> the snapshot first and then backup to third party storage, and restore it as needed in another site, the backup may be seldom or not frequently 08:30:38 <joehuang> no way of replication can keep consistency without quiece and snapshot, the data changed now and then 08:31:01 <colintd> The data to replicate might be in cinder volumes, but it could also be some part of a Swift KV store. 08:31:13 <joehuang> the 3rd proposal is way of replication 08:32:17 <colintd> Option 3 was much closer to what I had in mind when I raised the original use case, and at a practical level to me it seems much easier to use 08:33:05 <joehuang> but no consistency guaranttee 08:33:29 <joehuang> especially for a group of VMs 08:33:33 <colintd> Agreed, but I'm not sure that the consistency guarantee actually helps you 08:34:18 <joehuang> the 1st and 2nd will help, for freeze and flush 08:35:22 <zhipeng> hello hafe 08:35:24 <colintd> But is it really viable to stop the entirety of your running system to take a consistent backup? Also does it help you if the VNFM in the new site starts a different number of VMs due to different scaling metrics 08:36:24 <joehuang> for those vms which can't be quiesce, 08:36:43 <colintd> What I'm trying to say is that my original vision for the use case was to provide help in replicating a subset of the data which would be used by the newly started VNF to restore state, rather than snapshot and replicate all storage 08:37:29 <joehuang> that can only be done based on VM 08:37:53 <joehuang> only application level knowledge know which part should be replicated 08:38:05 <joehuang> which part shouldn't 08:39:19 <colintd> I guess the key question relates to application awareness of what is going on. Options 1 & 2 seem to be more about replicating/backing up any VNF, whereas option 3 is about proving a service to a replication aware application. 08:39:22 <joehuang> for those vms which can't be quiesce, the consistency can only be ensured by the data writing policy of the application 08:40:51 <joehuang> yes, for the 3rd option, the app should know which VM has replication capability, and write regarding data to this volume, and gurarattee consistency by the app itself 08:41:02 <colintd> Agreed. 08:41:13 <joehuang> sorry wrong typing 08:41:21 <joehuang> yes, for the 3rd option, the app should know which volume has replication capability, and write regarding data to this volume, and gurarattee consistency by the app itself 08:41:59 <joehuang> so the option 1/2/3 are all useful, but for different scenario 08:42:09 <colintd> So I think there are reapply two different things we could be attempting here: 1) Provide a facility for snapshotting and replicatings any running VNF 2) Provide some replication facilities for use by aware VNFs 08:42:35 <joehuang> colin, same idea 08:43:13 <joehuang> others opinion? 08:44:40 <zhipeng> I think we should assume for any running VNFs 08:44:43 <joehuang> sorantis, malla, zhipeng, your idea> 08:44:45 <colintd> The challenge with case 1 is that to get abstract snapshot/replication to work requires lots of cooperation from VNFM (to ensure exactly the same set of VNFs in backup), that you can afford to regularly pause you live system to take snapshots, and that all the data you replicate (including things like IP addresses) are just as relevant in your target system as the source 08:45:45 <colintd> I think that to make case 1 work for any VNF is a very difficult, perhaps impossible, problem 08:45:52 <sorantis> I think there’s a use case for each of the three options 08:46:09 <sorantis> first one requires Nova to expose a pair of API calls 08:46:13 <joehuang> for option 1, after restore, some reconfiguration is needed to some extent 08:46:20 <sorantis> the third one can be achieved already, right 08:46:50 <joehuang> the 3rd one require cinder with minor update 08:46:56 <sorantis> so basically we can describe those alternatives, and it’s up to an operator to select the one best fit his use case 08:47:19 <joehuang> to record the reference and could be retrieved by upper layer software 08:47:30 <joehuang> to sorantis: quite agree 08:48:07 <joehuang> the second one needs no update, but only on single VM level 08:48:42 <sorantis> even the first one could be automated by using virsh 08:49:14 <sorantis> there was no mention of changes to cinder in option 3) 08:49:21 <zhipeng> so I think for requirements perspective we could recommend all three options 08:49:50 <joehuang> I'll update the option 3 for the requirement 08:50:02 <colintd> Agree that it is an operator choice, but I still hold that to make option 1 work involves a lot more than just replicating the volumes. 08:50:03 <Malla> Yes, I agree with Zhipeng, from requirement point of view we can update all 3 08:50:40 <joehuang> #action update requirements to openstack for 3 options, joehuang 08:51:10 <sorantis> one thing that could be really beneficial to the user is to have example use cases for each option 08:52:11 <Malla> it's a nice idea Dimitri 08:52:14 <joehuang> could you also update the etherpad with example 08:53:09 <joehuang> #agree requirements perspective we could recommend all three options, that it is an operator choice 08:53:59 <joehuang> so may I ask colin to add example for option 3, and sorantis to add example for option 1,2? 08:54:07 <zhipeng> congrats we've settled another use case :) 08:54:22 <colintd> Happy to update #3 08:54:30 <sorantis> I’ll update #1 08:54:49 <joehuang> ok, I will update #2 08:55:32 <joehuang> #action add example, sorantis #1, joehuang #2, colin #3 08:55:40 <joehuang> great, we have a very efficient meeting today 08:56:06 <joehuang> after the example has been updated, i will sum up it is .rst for review and approve 08:56:44 <joehuang> and please register in OPNFV gerrit/gira system, we can track these issues 08:56:53 <zhipeng> thx joehuang 08:56:55 <joehuang> we have use case 1 for review and approve 08:57:25 <joehuang> and colin to update the desciption for the use case 2 ( which is one action item in july ) 08:57:44 <colintd> Will do 08:57:50 <joehuang> thanks a lot 08:58:19 <joehuang> please keep eyes on the gira/gerrit, and add youself to the reviewer 08:58:32 <joehuang> or assigned to you volunteely 08:59:02 <colintd> Till next week.... 08:59:03 <joehuang> thank you all, see you next meeting, we can start use case 4 in next meeting 08:59:16 <joehuang> bye 08:59:24 <sorantis> bye 08:59:26 <Malla> bye 08:59:28 <joehuang> #endmeeting