15:59:08 <moizer_> #startmeeting clustering_hackers 15:59:08 <odl_meetbot> Meeting started Tue Feb 10 15:59:08 2015 UTC. The chair is moizer_. Information about MeetBot at http://ci.openstack.org/meetbot.html. 15:59:08 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:59:08 <odl_meetbot> The meeting name has been set to 'clustering_hackers' 16:06:55 <moizer_> #info Gary Wu presenting information on Unified Secure Channel 16:07:26 <moizer_> #info wants to support call home like netconf call home 16:07:39 <moizer_> #info device needs to make an inbound call to controller 16:08:23 <moizer_> #info device creates a call home connection 16:08:57 <moizer_> #info this allows controller to talk to device 16:11:11 <moizer_> #info assumptions that any node in cluster should be able to respond to a request instead of bouncing it around 16:11:33 <tbachman> gwu: do you have a link for the slides, by chance? 16:11:39 * tbachman apologizes if this has been asked already 16:11:58 <moizer_> #info important that rpc request needs to be routed to the node with the connection 16:12:29 <tbachman> #info Other considerations: scalability; should call home devices be “multi-homed” to multiple controller nodes 16:12:35 <moizer_> we can share it on the wiki 16:12:41 <tbachman> moizer_: thx! 16:13:08 <tbachman> #info moizer_ asks gwu if the idea is that the request to controller be bounced — is that so you don’t get a redirect? 16:13:12 <tbachman> #info gwu says yes 16:14:18 <tbachman> #info moizer_ says that the routed RPC mechanism should support this 16:15:48 <tbachman> #info uchau asks in the clustering model, what happens to an OF switch when taht node goes down; needs device ownership model so that the device can work with another node in the controller 16:16:07 <tbachman> #info gwu says when a node goes down, the device needs to reconnect with one of the other nodes 16:16:26 <tbachman> #info uchau asks if USC was going have openflow also go through the secure channel 16:16:28 <tbachman> #info gwu says yes 16:16:58 <tbachman> #info uchau is interested in developing a device ownership concept, which helps provide failover direction 16:17:32 <tbachman> #info uchau says in this case, if of connects directly or through secure channel, the ownership model is the same 16:17:43 <tbachman> #info gwu asks how openflow deals with multihoming/mastership? 16:17:58 <tbachman> #info uchau says the openflow team is implementing a message that allows the controller to assert the role 16:18:16 <tbachman> #info uchau says that it can look at the device ownership when a device connects, and assert the role 16:19:33 <tbachman> #info Helen says that clustering already has a supernode concept — asks if this is related 16:20:53 <tbachman> #info moizer_ says for data, there is a concept of leaders and followers, but that does not mean you can go to another node to access inventory 16:26:00 <tbachman> #info Helen asks that w/o a load balancer, is it possible for clustering to solve this problem 16:26:18 <tbachman> #info moizer_ recommends using virtual IPs for the controller 16:27:13 <tbachman> #info uchau says one option is to have the device connect to all the controllers in a team, which is similar to the openflow model 16:29:59 <moizer_> I guess a problem with virtual ip and load balancing is how do you do keep-alives 16:30:28 <tbachman> #info moizer_ says one problem with using a virtual IP and load balancing is how to do keep-alives 16:30:34 <tbachman> :) 16:30:58 <tbachman> #info gwu asks what the scalability is of that model — how many connections can a node handle 16:31:20 <tbachman> #info uchau says that jmedved was maybe targeting 5k, but wasn 16:31:22 <tbachman> #undo 16:31:31 <tbachman> ack 16:31:33 <tbachman> no chair ;) 16:31:41 <tbachman> : #info uchau says that jmedved was maybe targeting 5k, but wasn’t sure whether that was per-node or per-cluster 16:31:51 <tbachman> #info uchau says that jmedved was maybe targeting 5k, but wasn’t sure whether that was per-node or per-cluster 16:32:04 <tbachman> #info Helen says that their requirement is for 1 million devices 16:32:25 * tbachman pictures Dr. Evil 16:33:27 <tbachman> #info moizer_ says with clustering, we can only store that we can fit into memory (i.e. storage can’t exceed the amount of memory available) 16:33:31 <tbachman> #info moizer_ says that’s a lot of operational data 16:33:42 <tbachman> #info Helen says all the other data is stateless 16:34:02 <tbachman> #info moizer_ says 1 million devices, and suspects that’s a lot of data in memory 16:34:19 <markmozolewski> Devices could maintain 1 Master / 1-2 Slave (backup) connections and establish new slave connections as failover occurs (vs. maintaining connections to all slaves), for cluster sizes >> 3. 16:34:49 <tbachman> #info Fabiel Zuniga says that the persistence service may be able to help here 16:35:04 <tbachman> #info markmozolewski says devices could maintain 1 Master / 1-2 Slave (backup) connections and establish new slave connections as failover occurs (vs. maintaining connections to all slaves), for cluster sizes >> 3. 16:36:09 <tbachman> #info moizer_ recommends connecting a bunch of devices and see how things perform 16:37:32 <tbachman> #info uchau asks if Helen wants the controller to support the load balancing, or using external load balancers 16:37:57 <tbachman> #info uchau guesses that the 1 million nodes is to be supported by the cluster, not by a single node in the cluster 16:39:18 <tbachman> #info moizer_ says with 64 switches in openflow, it takes about 4-1/2 MB in the data store 16:39:38 <moizer_> #info I need to talk about bugs/patches for 10 mins 16:40:08 <tbachman> #info catohornet asks with timeouts in the cluster — sees issue with many nodes, and where they’re configured topologically 16:40:39 <tbachman> #info moizer_ says you don’t need to have every node fully replicated; as an example, with routing logic and 5 cluster nodes, you might choose to do replication on only 3 of the nodes 16:40:39 <markmozolewski> Controlling replication factors will be key for that scale. 16:42:16 <tbachman> #info gwu asks if the proposal is workable 16:42:18 <tbachman> #info moizer_ says yes 16:42:50 <tbachman> #info gwu was thinking of presenting statistics to the MD-SAL (e.g. bytes transferred); asks about this (e.g. effects on data store as things scale) 16:43:26 <tbachman> #info moizer_ says if stats colllection interval isn’t too low, then it should be okay (e.g. no client will be reading stats every 3 seconds) 16:44:33 <tbachman> #topic bug review 16:44:36 <tbachman> darn 16:44:43 <tbachman> have to be chair for that one, too 17:11:16 <tbachman> moizer_: fyi — 10 past the hr 17:15:33 <tbachman> moizer_: 15 past the hr and I need to run - don’t forget to do the #endmeeting 17:15:46 <moizer_> thx tbachman 17:15:53 <tbachman> np! 17:31:31 <markmozolewski> we'll create the bug for Akka upgrade and Vamsi will take the lead investigating not just stability with latest version but also performance of new version vs. old version (using Jan's current test scripts in a cluster) 17:58:15 <moizer_> #endmeeting