08:01:48 #startmeeting multisite 08:01:48 Meeting started Thu Jul 16 08:01:48 2015 UTC. The chair is joehuang. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:01:48 Useful Commands: #action #agreed #help #info #idea #link #topic. 08:01:48 The meeting name has been set to 'multisite' 08:01:55 OK sorry, good afternoon thn.. :) 08:02:26 when will you start your vacation? 08:02:33 Greetings multisiters 08:02:45 greetings ! 08:02:46 nice to see you again 08:03:22 #info rollcall 08:03:35 #info joehuang 08:03:53 #info colintd 08:04:47 joehuang, it should be #topic rollcall :P 08:04:54 #info zhipeng 08:04:55 sorry 08:05:01 #info Malla 08:05:10 colintd_ I'm reading your email now :) 08:05:43 yes, I also read the mail and slides you guys shared today 08:06:19 #topic use case 1 identity service management 08:06:31 Thanks again for the foils / diagrams. I think trying to converge on one or a set of architecture(s) is exactly what we need to do 08:06:37 More offline though.... 08:06:42 I 'll introduce the prototype briefly 08:07:13 The prototype is based on hafe (Hans ) docker image 08:07:28 yep, i think it is not necessariliy we converge to one, but one set of should be fine colintd_ 08:07:40 I will leave the flour to joe now :P 08:08:09 only few minutes about identity, then let's dicussion the arhictecture idea 08:08:28 cluster works 08:09:00 but the async replication between cluster has not been done because some configuraiton missed 08:10:03 and I also think multi-master cluster plus multi-read-only slave (start mode) distributed in multisite may work for fernet token 08:10:25 need further prototype 08:11:46 So before the prototype finished, let's discuss the use case 2 requirements to OpenStack and the architecture idea. 08:11:46 so what would be the impact on Multisite if the prototype succeed ? 08:12:31 Before leaving #1, I'd suggest we should think upfront about what behaviour we want under partition. With multi-master systems, the partition behaviour almost always forces you down particular paths (CP/AP) 08:12:39 if the prototype succeed, then we can recommend to use fernet token with each site installed with KeyStone service 08:13:58 yes. the management network need to be established for replication 08:14:56 In e-commerce company, they have deployed master-mulit-read only slave DB backend 08:15:36 so how the partition tolerance goes for them ? 08:16:04 For keystone federation, the challenge is if you add one more role, you have to change all keystone services seperatly 08:16:41 the synchronization configuration in multiple federated KeyStone service is also a challenge 08:16:41 My point is that we should be clear about what changes we want to allow if the network is partitioned, and then critically how the resulting system converges when the partition ends. 08:17:23 Many telcos have a very hard requirement that geographic sites should be able to operate as isolated entities to deal with earthquake. flood, fire, etc knocking out interconnects. 08:17:33 to colintd, what's your idea on identity service management in multisite scenario? 08:17:53 They normally need to ability to make changes in the isolated site to deal with changing circumstances 08:18:15 how to do that? 08:18:24 I just want to highlight this fact and make sure that we have a plan on how this is allowed, and more importantly how the reconvergence works. 08:18:47 maybe we put this as a new item ? :) 08:18:52 I'm not saying the prototype doesn't support this (I must confess I haven't had time to look yet), but wanted to raise the issue. 08:19:28 I suggest let's mark this problem for use case #1 08:19:37 and see how prototype goes 08:20:03 or at least what prototype could provide for some insight on this issue 08:20:10 that's why we want to have keystone service in each site, therefore, no matter the site failure for earthquake, flooding, or anything else, other sites can still work. 08:21:30 joehuang I think colintd_'s concern is a very legit one, let's keep the experiement going, and see how this question would be answered :) 08:21:55 we could wrap this all in the end into the requirement doc 08:22:08 ok 08:22:35 let's continue on use case #2, hope we could reach some conclusion today :P 08:23:48 if not each site installed with KeyStone service, we have to process "Escape from site level KeyStone failure" use case. I'll go on the prototype, and let's mark what colin's concerns and see how to address it 08:24:52 #info if not each site installed with KeyStone service, we have to process "Escape from site level KeyStone failure" use case. I'll go on the prototype, and let's mark what colin's concerns and see how to address it 08:26:54 for use case 2, the requirement to OpenStack is 1) Cross Neutron L3 networking, 2) Cross Neutron L2 networking 08:27:58 So for #2, I think this is a classic case of what colintd_ proposed in the email discuz, that we have both mgmt and NFVI requirements 08:28:29 for #2 there is a large part of the deal involves mgmt 08:28:29 to Colin, what's your concerns on this use case requirements to OpenStack? 08:28:56 I agree that some of the control/mgmt function should be handled by OSS/BSS or MANO 08:29:08 Absolutely. The use case is about both the VNF networking and the management it requires 08:29:10 for example? 08:29:32 but I think VIM should also be able to support the actual implementation of the upper decisions 08:30:52 I'm happy with the current usecase/requirements text in the etherpad. However, agree with zhipengs efforts to produce a small number of coherent architecture diagrams to pull together all the usecases. 08:31:06 Hello. Can we narrow the problem space down to HA of independent clouds / VIMs on the same site, i.e. the same core router / not crossing the WAN? 08:31:30 why not crossing WAN fzdarsky :P 08:31:31 same site, I assume 08:31:32 I think those architecture diagrams are great (thanks!) but lack the aspect of geos 08:31:44 see colintd_ 's mail 08:31:54 for cross site, it's something for the use case 3 08:32:01 ah get it 08:32:10 yes. different use case, potentially different solutions 08:32:18 Agreed #2 is intrasite clouds, #3 is intersite clouds 08:32:37 #agreed #2 is intrasite clouds, #3 is intersite clouds 08:32:47 as in: for intrasite, I would expect a single VNF instance 08:32:48 bot cmd is your friend lol 08:32:51 which implies a single VNFM instance 08:33:16 for cross-site, it would be independent VNF instances 08:33:35 + VNFMs 08:33:36 not single VNFM, can be multiple VNFM, different VNF can be managed by different VNFM 08:33:43 Agree again. Each VNFI has a single "owning" VNFM, but that VNFM may cover multiple clouds in a site 08:34:03 It's a question of fate sharing. 08:34:09 fzdarsky could there be multiple VIMs in one site ? 08:34:18 (The whole internal structure of the NFVO and VNFM is not covered by the ETSI NFV docs, but must clearly be HA and cross-cloud to hit the require availability numbers) 08:34:20 zhipeng, absolutely! 08:34:32 VNFM is software to manage VNF 08:34:46 Yes, VNFM can cover multisite VNFs 08:35:06 Absolutely multiple VIMs per site. As per my previous email, this is a very common model in the data world to get availaibblity 08:35:44 The scope of the VNFM is defined by the scope of the VNF instance. 08:35:44 so for #2 what could we settle on requirements now? 08:36:37 guys, could we agree single site across VIM overlay L2 networking being one of the requirements of #2? 08:36:48 but this scope is not limited to a site, right 08:37:05 Malla, for use case #2 it is 08:37:20 to Malla, "VNFM can cover multisite VNFs" -> "VNFM can cover multiple/multisite VNFs" 08:37:21 I think L2 inter-cloud intra-site is a common approach, but L3 solution in site is a valid (if harder option). 08:37:57 so both L2 and L3 should be addressed, agree ? 08:38:00 Agree 08:38:13 To my the biggest difference between #2 & #3, is that #2 is all about maintaining media/signalling and calls (which requires IP transfer), whilst #3 is about restoring/continuing service but most likely not calls. 08:38:14 agree (as long as only intra-site :)) 08:38:41 agree to coline 08:38:43 yep maintaining calls would be nightmare for #3 :P 08:38:45 so 08:38:49 #3 does not require special openstack networking support, but #2 does. 08:39:00 agree 08:39:32 #agreed inter-cloud intra-site L2 and L3 networking enhancement is one requirement from OPNFV Multisite to OpenStack 08:39:40 IP continuity across geos is a road we don't want to take; this is why clearly separating #2 and #3 is important. 08:39:42 Ok, can we get conclusion, that overlay L2 networking across Neutron service intra-site is required in OpenStack? 08:40:02 I think we just voted yes on this 08:40:27 for #3, it's more about volume replication for restoration purpose 08:40:38 yes 08:40:39 I think we're agreed that #2 needs some technology for allowing transfer of IP addresses/traffic. This could be either L2 or L3. 08:41:11 colintd_ what would this translated to a requirement for OpenStack? 08:42:06 Agree, but we need to describe that from two aspect, one if for VNF communication to other VNF, another one is for VNF internal communition for heart-beat, session replication 08:43:26 For L2 the major requirements relate to config/management of those networks. For L2 do you need to use provider networks? Exactly how do you disable anti-spoof support? etc. For L3 it might make sense to have a common neutron api for the take IP/ free IP support, which can then be plumbed onto multiple underlying technologies. In fact it may even make sense to use the same API for L2, just have it trigger GARP. 08:43:59 #info For L2 the major requirements relate to config/management of those networks. For L2 do you need to use provider networks? Exactly how do you disable anti-spoof support? etc. For L3 it might make sense to have a common neutron api for the take IP/ free IP support, which can then be plumbed onto multiple underlying technologies. In fact it 08:43:59 may even make sense to use the same API for L2, just have it trigger GARP. 08:44:23 colintd_ how about joe's comment on the two aspects ? 08:44:42 #info may even make sense to use the same API for L2, just have it trigger GARP. 08:44:50 (missed the last piece ;)) 08:45:58 Yes, worth pulling out that we have the need to route external traffic to the VNFI split across two clouds, and the somewhat separate need to setup the intercloud comms for intra-VNF traffic. 08:46:12 The latter leads onto cross-cloud tenant networks 08:46:47 agree, it's cross-cloud tenant networks for intra-VNF traffic\ 08:47:33 For the routing piece, the question is whether this leads to a req'ment on OpenStack or not. 08:47:49 (if we want to fail over between independent OpenStack instances). 08:47:51 does Neutron support it right now? 08:48:30 I didn't think it did, but could be wrong 08:48:49 currently you have to routing through external network(this is provider network) or through inter-VPN connection 08:49:07 then I think it should be a requirement for OpenStack 08:49:14 The key issue is that of scope, where you have two clouds coordinating/interacting (hence being covered by multisite). 08:49:31 Agree 08:50:15 Sorry guys, just for my information, we are planning to discuss an architecture proposal (i.e. Identifying best option in the proposal) in this meeting or next meeting? 08:50:33 we could discuss it firstly via email 08:50:40 ok 08:50:42 thanks 08:50:45 and then in the meeting :) 08:50:48 we need tenant level L2 network across neutron to bride the tenant router for L3. for L2 , we need cross Neutron overlay L2 network (some application using L2 network for session replication) 08:50:50 save time for use cases 08:52:13 ok then could we sum up a second req out of #2? that is given intra-site inter-cloud we need L2/L3 IP traffic transfer 08:52:39 ? or someone reword it for a better description :P 08:53:22 I'll have a go at a reword 08:53:44 thanks 08:54:04 so then we could agree upon a second req on this aspect, right? 08:56:11 Yes 08:57:06 #agreed second req out of use case #2 given intra-site inter-cloud enhancement on L2/L3 IP traffic transfer (inter and intra VNF) is a requirement from OPNFV Multisite to OpenStack 08:57:26 #action colintd_ to reword the req to be more accurate :) 08:58:48 okey I think we have a pretty awesome session today :) 08:59:15 Agree. Lots of ground covered and some real improvements in our shared vision 08:59:30 time flies, it take time to reword the requirement. Let's have a discussion on arhichtecture and use case #3 discussion. and review on the reword 08:59:48 in the next meeting 09:00:07 and arch discussion could be done in ML 09:00:13 agree 09:00:24 thanks you all for the meeting 09:00:31 thank you 09:00:37 see you next time 09:00:56 thanks! 09:01:02 #endmeeting