08:02:08 <joehuang> #startmeeting multisite
08:02:08 <collabot> Meeting started Thu Sep  3 08:02:08 2015 UTC.  The chair is joehuang. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:02:08 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic.
08:02:08 <collabot> The meeting name has been set to 'multisite'
08:02:22 <sorantis> joehuang, could you update your commit for keystone?
08:02:23 <joehuang> #topic rollcall
08:02:33 <sorantis> #info Dimitri
08:02:36 <joehuang> ok
08:02:38 <colintd> #info colintd
08:02:43 <joehuang> #info joehuang
08:02:43 <Malla> #info Malla
08:02:54 <Tapio_T> #info Tapio Tallgren
08:03:12 <sorantis> I tried to review it, but scrolling was unmanageable
08:04:09 <joehuang> ok, i will update the commit with break for the sentence longer than 80 characthers
08:04:35 <sorantis> thanks!
08:04:43 <joehuang> for the .rst, it looks good
08:05:04 <joehuang> but for gerrit, it will not autowrap
08:05:09 <sorantis> i meant the rst
08:05:33 <joehuang> sure, I will update it
08:05:51 <joehuang> #action update the keystone commit, joehuang
08:06:11 <joehuang> #topic VNF Geo site redundancy
08:06:44 <joehuang> #link https://etherpad.opnfv.org/p/VNF_Geo_site_redundancy
08:07:18 <joehuang> #info three proposal were presented in the etherpad for VNF geo site redundancy
08:07:38 <sorantis> yes
08:07:51 <sorantis> I’ve got a question on nova quiescing
08:07:57 <joehuang> pls
08:08:20 <sorantis> in the solution proposal 1 you mentioned that “Need Nova to expose the quiesce / unquiesce, fortunately it's alreay there in Nova-compute, just to add API layer to expose the functionality.
08:08:21 <sorantis>08:08:58 <joehuang> no Nova API for quiesce / unquiesce
08:09:35 <sorantis> what about this? https://blueprints.launchpad.net/nova/+spec/quiesced-image-snapshots-with-qemu-guest-agent
08:09:37 <joehuang> but the functionality has been used by Nova snapshot, and implemented in the nova-compute
08:09:51 <joehuang> it's this one
08:09:59 <joehuang> but no API exposed
08:10:12 <joehuang> just create VM snapshot
08:10:13 <sorantis> then the related interface is exposed in glance
08:10:15 <sorantis> os_require_quiesce=yes
08:10:50 <joehuang> the image metadata should be os_require_quiesce=yes
08:11:00 <sorantis> that’s right
08:11:34 <joehuang> I mean no Nova api to quiesce/unquiesce VM directly
08:12:22 <sorantis> ok, so the intention is to enable fs quiescing on a running VM
08:12:38 <joehuang> you can boot a VM with image attribute with os_require_quiesce=yes ( the image support the guest agent to quiesce )
08:12:42 <joehuang> yes
08:13:29 <joehuang> the intention is to ask Nova to expose explict api to quiesce/unquiesce running VM
08:13:37 <sorantis> right the point of having this metadata attribute is to install the necessary agents
08:14:12 <joehuang> the image is built in with the necessary agent that make quiesce working
08:15:08 <sorantis> ok
08:15:19 <sorantis> I’ve stumbled upon this post http://www.sebastien-han.fr/blog/2015/02/09/openstack-perform-consistent-snapshots-with-qemu-guest-agent/
08:15:37 <joehuang> #info the purpose to expose the quiesce/unquiesce API directly is to make transactional snapshot of a group of VMs is possible
08:15:38 <sorantis> could be used as an interim solution
08:16:09 <joehuang> that way you have to manually loggon to the VM and freeze the VM by yourself
08:16:10 <sorantis> #info …in Nova
08:16:25 <sorantis> not necessarily
08:16:33 <sorantis> once could use virsh
08:16:46 <sorantis> sudo virsh qemu-agent-command instance-00000008 '{"execute":"guest-fsfreeze-freeze"}'
08:17:21 <joehuang> sure , you can use command line on that phsycical server
08:17:34 <joehuang> sorry I can not opent the link you shared
08:17:58 <sorantis> # info Possible interim solution http://www.sebastien-han.fr/blog/2015/02/09/openstack-perform-consistent-snapshots-with-qemu-guest-agent/
08:18:18 <sorantis> I can share the text in an email
08:19:00 <joehuang> that;s the current API implementation. I just open the page
08:19:54 <sorantis> that’s right. just thought to share this in case nova refuses to expose this function as an api ;)
08:20:37 <joehuang> I saw the Nova PTL comment in one BP raised by Cinder
08:21:26 <joehuang> can restore the topic if there is engineer willing to work on it
08:21:47 <sorantis> can you share the bp?
08:21:51 <joehuang> thanks. Sorantis, helpful
08:21:54 <joehuang> yes
08:22:18 <joehuang> wait moment
08:23:00 <colintd> I know I've missed some of chunk of discussion over the last week or two, but could we step back a moment and discuss whether snapshotting all the attached volumes to the current set of VMs for a VNF is what is needed?
08:24:44 <joehuang> sorry take time to search the bp
08:24:47 <joehuang> share later
08:24:58 <joehuang> to conlin, one second
08:25:59 <joehuang> if the site where the VNF failed, the VNF can restore in another site
08:26:30 <joehuang> especially catastrophic failures (flood, earthquake, propagating software fault),
08:27:14 <joehuang> so how to restore all regarding VNF in another site, but usually not running
08:27:15 <colintd> I agree the desire is to restore function, my concerns is that trying to do that simply by snapshotting all the volumes is quite a coarse approach, with lots of challenges both in terms of consistency, but also interplay with the VNF Manager and orchestrator.
08:27:54 <joehuang> we are talking about the consistency way, that's the proposal 1 and 2 involved
08:28:20 <colintd> Consider a highly elastic system, are we going to be replicating and deleting volumes each time a VM is added or removed?  Also what about the need for tight coordiantion between the replication code and the VNFM so the right volume is used a restart?
08:29:31 <colintd> My expectation was that we were going to focus on helping applications with replication of a _select_ set of data between sites which they would use when restarted
08:29:33 <joehuang> the snapshot first and then backup to third party storage, and restore it as needed in another site, the backup may be seldom or not frequently
08:30:38 <joehuang> no way of replication can keep consistency without quiece and snapshot, the data changed now and then
08:31:01 <colintd> The data to replicate might be in cinder volumes, but it could also be some part of a Swift KV store.
08:31:13 <joehuang> the 3rd proposal is way of replication
08:32:17 <colintd> Option 3 was much closer to what I had in mind when I raised the original use case, and at a practical level to me it seems much easier to use
08:33:05 <joehuang> but no consistency guaranttee
08:33:29 <joehuang> especially for a group of VMs
08:33:33 <colintd> Agreed, but I'm not sure that the consistency guarantee actually helps you
08:34:18 <joehuang> the 1st and 2nd will help, for freeze and flush
08:35:22 <zhipeng> hello hafe
08:35:24 <colintd> But is it really viable to stop the entirety of your running system to take a consistent backup?  Also does it help you if the VNFM in the new site starts a different number of VMs due to different scaling metrics
08:36:24 <joehuang> for those vms which can't be quiesce,
08:36:43 <colintd> What I'm trying to say is that my original vision for the use case was to provide help in replicating a subset of the data which would be used by the newly started VNF to restore state, rather than snapshot and replicate all storage
08:37:29 <joehuang> that can only be done based on VM
08:37:53 <joehuang> only application level knowledge know which part should be replicated
08:38:05 <joehuang> which part shouldn't
08:39:19 <colintd> I guess the key question relates to application awareness of what is going on.  Options 1 & 2 seem to be more about replicating/backing up any VNF, whereas option 3 is about proving a service to a replication aware application.
08:39:22 <joehuang> for those vms which can't be quiesce, the consistency can only be ensured by the data writing policy of the application
08:40:51 <joehuang> yes, for the 3rd option, the app should know which VM has replication capability, and write regarding data to this volume, and gurarattee consistency by the app itself
08:41:02 <colintd> Agreed.
08:41:13 <joehuang> sorry wrong typing
08:41:21 <joehuang> yes, for the 3rd option, the app should know which volume has replication capability, and write regarding data to this volume, and gurarattee consistency by the app itself
08:41:59 <joehuang> so the option 1/2/3 are all useful, but for different scenario
08:42:09 <colintd> So I think there are reapply two different things we could be attempting here: 1) Provide a facility for snapshotting and replicatings any running VNF 2) Provide some replication facilities for use by aware VNFs
08:42:35 <joehuang> colin, same idea
08:43:13 <joehuang> others opinion?
08:44:40 <zhipeng> I think we should assume for any running VNFs
08:44:43 <joehuang> sorantis, malla, zhipeng, your idea>
08:44:45 <colintd> The challenge with case 1 is that to get abstract snapshot/replication to work requires lots of cooperation from VNFM (to ensure exactly the same set of VNFs in backup), that you can afford to regularly pause you live system to take snapshots, and that all the data you replicate (including things like IP addresses) are just as relevant in your target system as the source
08:45:45 <colintd> I think that to make case 1 work for any VNF is a very difficult, perhaps impossible, problem
08:45:52 <sorantis> I think there’s a use case for each of the three options
08:46:09 <sorantis> first one requires Nova to expose a pair of API calls
08:46:13 <joehuang> for option 1, after restore, some reconfiguration is needed to some extent
08:46:20 <sorantis> the third one can be achieved already, right
08:46:50 <joehuang> the 3rd one require cinder with minor update
08:46:56 <sorantis> so basically we can describe those alternatives, and it’s up to an operator to select the one best fit his use case
08:47:19 <joehuang> to record the reference and could be retrieved by upper layer software
08:47:30 <joehuang> to sorantis: quite agree
08:48:07 <joehuang> the second one needs no update, but only on single VM level
08:48:42 <sorantis> even the first one could be automated by using virsh
08:49:14 <sorantis> there was no mention of changes to cinder in option 3)
08:49:21 <zhipeng> so I think for requirements perspective we could recommend all three options
08:49:50 <joehuang> I'll update the option 3 for the requirement
08:50:02 <colintd> Agree that it is an operator choice, but I still hold that to make option 1 work involves a lot more than just replicating the volumes.
08:50:03 <Malla> Yes, I agree with Zhipeng, from requirement point of view we can update all 3
08:50:40 <joehuang> #action update requirements to openstack for 3 options, joehuang
08:51:10 <sorantis> one thing that could be really beneficial to the user is to have example use cases for each option
08:52:11 <Malla> it's a nice idea Dimitri
08:52:14 <joehuang> could you also update the etherpad with example
08:53:09 <joehuang> #agree requirements perspective we could recommend all three options, that it is an operator choice
08:53:59 <joehuang> so may I ask colin to add example for option 3, and sorantis to add example for option 1,2?
08:54:07 <zhipeng> congrats we've settled another use case :)
08:54:22 <colintd> Happy to update #3
08:54:30 <sorantis> I’ll update #1
08:54:49 <joehuang> ok, I will update #2
08:55:32 <joehuang> #action add example, sorantis #1, joehuang #2, colin #3
08:55:40 <joehuang> great, we have a very efficient meeting today
08:56:06 <joehuang> after the example has been updated, i will sum up it is .rst for review and approve
08:56:44 <joehuang> and please register in OPNFV gerrit/gira system, we can track these issues
08:56:53 <zhipeng> thx joehuang
08:56:55 <joehuang> we have use case 1 for review and approve
08:57:25 <joehuang> and colin to update the desciption for the use case 2 ( which is one action item in july )
08:57:44 <colintd> Will do
08:57:50 <joehuang> thanks a lot
08:58:19 <joehuang> please keep eyes on the gira/gerrit, and add youself to the reviewer
08:58:32 <joehuang> or assigned to you volunteely
08:59:02 <colintd> Till next week....
08:59:03 <joehuang> thank you all, see you next meeting, we can start use case 4 in next meeting
08:59:16 <joehuang> bye
08:59:24 <sorantis> bye
08:59:26 <Malla> bye
08:59:28 <joehuang> #endmeeting