08:07:14 <zhipeng> #startmeeting Multisite Weekly Meeting 2015.06.18
08:07:14 <collabot> Meeting started Thu Jun 18 08:07:14 2015 UTC.  The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:07:14 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic.
08:07:14 <collabot> The meeting name has been set to 'multisite_weekly_meeting_2015_06_18'
08:07:28 <zhipeng> #topic roll call
08:07:34 <zhipeng> #info zhipeng
08:07:38 <zhipeng> #info joehuang
08:07:41 <zhipeng> #info hafe
08:07:43 <Malla> #info Malla
08:08:04 <zhipeng> #info colintd
08:08:28 <zhipeng> #info Dimitri Mazmanov
08:08:40 <zhipeng> #info hafe is Hans Feldt
08:09:15 <Malla> Hi guys, it is possible for anyone to connect gomeeting...?
08:10:10 <Malla> https://global.gotomeeting.com/join/959683557
08:10:32 <Malla> or else can we make IRC meeting only?
08:11:05 <zhipeng> Malla let's proceed in parallel
08:11:14 <zhipeng> #topic Use Case Discussion
08:11:29 <Malla> ok
08:16:43 <zhipeng> #info use case 2
08:27:41 <zhipeng> colintd could you write down your point starting with #info ?
08:36:35 <sorantis> ok, let’s switch
08:36:58 <colintd> #info I'm was focussing on the need to be able to redirect external traffic between application instances sitting in one or more "local" clouds.
08:37:55 <colintd> Where the example was suggested (I missed who) of a VNF implementing VRRP, with one instance in each of two clouds, and the question being what config/changes are needed to allow this to work ( is anti-spoof an issue)
08:38:06 <colintd> {Just capturing what was said}
08:38:12 <sorantis> that’s great
08:38:17 <zhipeng> #info Where the example was suggested (I missed who) of a VNF implementing VRRP, with one instance in each of two clouds, and the question being what config/changes are needed to allow this to work ( is anti-spoof an issue)
08:38:20 <joehuang> Hi, Collin, I did not hear very clear about your opionion on SDN controller used for the VNF HA
08:38:27 <zhipeng> capture it with #info :P
08:38:30 <sorantis> you hade a nice statement regarding VRRP
08:38:35 <sorantis> can you tag that here too?
08:38:43 <colintd> #info  I'm was focussing on the need to be able to redirect external traffic between application instances sitting in one or more "local" clouds.
08:39:46 <colintd> #info We also talked about SND controllers, and how in many telco deployments these control much broader end-to-end traffice than simply intra-cloud.  They might however be implemented using multiple redundant control nodes, say one per cloud, but providing a global function.
08:40:17 <colintd> #info In this case neutron may be used to "connect" to those networks, but isn't the major control interface for the whole system, just a "joining" interface
08:40:36 <fzdarsky> colintd, all, sorry for joining the discussion late:
08:40:51 <joehuang> the SDN contrller to control VNF
08:41:20 <fzdarsky> Why move the IP to a different cloud, rather than reconnect to a different endpoint (the classical solution)?
08:41:36 <fzdarsky> Moving IP / VRRP across clouds needs fixing of routing
08:41:47 <fzdarsky> (in BGP)
08:41:59 <fzdarsky> and is not performant or scalable...?
08:42:03 <colintd> #info Finally, returning to traffic failover, we talked about how for L2 failover can be triggered by apps just using GARP, but L3 requires protocol level (BGP/SDN) API.  We also talked about how L3 convergence times (say BGP) might be too slow by default, especially in error cases (loss of node) as opposed to managed failover.
08:42:57 <fzdarsky> Ah, ok, so there's agreement that moving IPs / VRRP across clouds is not a viable option?
08:43:38 <colintd> Moving the IP address is required when you have a core node serving lots of remote endpoints in the external network (e.g. voip phones with RTP streams). On failover it is too slow to resignal all of those, so you need to redirect traffic to the new node.  Given routing is based on IP address, this needs to be moved
08:43:54 <colintd> No, I don't agree with that ;-)
08:44:11 <colintd> In many ways it is currently the most practical solution, and the one a number of customers are trialling.
08:44:26 <colintd> L3 is in some ways more elegant, just much hardered to make work
08:44:40 <fzdarsky> So is this meant as temporary solution, until you've resignalled the clients?
08:45:24 <colintd> No.  It is permenant / transparent to the clients.  Resignalling when there is "local" equipment failure of the core network elements is not expected, and does not occur in today's bare metal solutions.
08:45:41 <fzdarsky> When you say L2/GARP, do you mean between 2 AZs of the same region or across region?
08:46:08 <Malla_> What about the session connectivity (VOIP), is there any service interruptions if we move the IP?
08:46:34 <colintd> Same region.  Expection is that L2 (or perhaps even L3) solution is only viable "locally".  Georedundant events will lead to dropped calls, but transfer of service end point to a different IP and resignalling.
08:46:49 <joehuang> to malla, therefore, session replication is needed
08:47:25 <colintd> At present you get a "glitch" on the media, maybe a few hundred ms of interruption, when the failure is detected, IP transferred, but then call continues.
08:47:26 <fzdarsky> colintd, ok, thanks for that important clarification.
08:48:00 <colintd> On "site failure" (as opposed to "cloud failure") new service IP and call drops, but new calls can be established promptly.
08:48:21 <fzdarsky> Though need to rate control due to signaling storm
08:49:15 <joehuang> For use case 2, we focus on cloud failure impact on VNF
08:49:53 <colintd> Indeed it is a worry.  Normally "solved" by non-calling clients not immediately spotting the change of IP, so their standard slow polls spot the change and retry.  Major issue is for "incoming" calls (i.e. towards registered endpoint) and providing notification.  Registration replication helps, but firewall/NATing might be different to new central site.
08:50:43 <colintd> Agree, use case 2 is all about the cloud impacts, though the VRRP example is a good one to add as it is "non-telco"
08:51:40 <fzdarsky> Would you agree there is a general trend towards L3 down to the hosts, i.e. minimize L2 broadcast domains?
08:51:45 <sorantis> FYI http://blog.aaronorosen.com/implementing-high-availability-instances-with-neutron-using-vrrp/ HA with VRRP
08:51:47 <fzdarsky> So is L2 a priority?
08:52:16 <fzdarsky> s/down to the hosts/to the access switch level/
08:53:08 <fzdarsky> (sorry for adding noise to the discussion)
08:53:23 <colintd> We have carriers trying to get L3 working, but none who have currently got is working sufficiently fast.  It will also tend to be much more dependnent on external tech selected for network.  So I'd say getting L2 working nicely (so no disabling of anti-spoof) today is key, will L3 being next goal
08:55:12 <colintd> "HA with VRRP" link very interesting / relevant, and what we are talking about is the same function but with the two halves of the "allowed address pair" being in different clouds
08:55:52 <joehuang> Agree
08:55:54 <sorantis> correct. So perhaps looking at this approach we can clearly see what exactly needs to be done
08:56:36 <sorantis> by*
08:56:38 <colintd> Excellent guide.  I will investigate before the next meeting
08:57:06 <joehuang> Good
08:57:17 <fzdarsky> +1
08:57:36 <sorantis> #info HA with VRRP http://blog.aaronorosen.com/implementing-high-availability-instances-with-neutron-using-vrrp/
08:57:38 <joehuang> The link is for functionality inside one cloud.
08:58:18 <colintd> #info We are looking at similar function between clouds, for which this may or may not need to be extended.  May also be interplay with provider networks.
08:58:27 <colintd> #info I will investigate before the next meeting.
08:59:04 <joehuang> good summary, colintd. :)
08:59:26 <sorantis> It seems that we can’t really cover more than one use cases in a meeting. so let’s keep our agendas short
08:59:26 <joehuang> Let's continue talk next time.
08:59:30 <zhipeng> okey let's conclude the meeting :)
08:59:43 <zhipeng> sorantis good point
08:59:50 <zhipeng> #endmeeting