15:59:08 <moizer_> #startmeeting clustering_hackers
15:59:08 <odl_meetbot> Meeting started Tue Feb 10 15:59:08 2015 UTC.  The chair is moizer_. Information about MeetBot at http://ci.openstack.org/meetbot.html.
15:59:08 <odl_meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:59:08 <odl_meetbot> The meeting name has been set to 'clustering_hackers'
16:06:55 <moizer_> #info Gary Wu presenting information on Unified Secure Channel
16:07:26 <moizer_> #info wants to support call home like netconf call home
16:07:39 <moizer_> #info device needs to make an inbound call to controller
16:08:23 <moizer_> #info device creates a call home connection
16:08:57 <moizer_> #info this allows controller to talk to device
16:11:11 <moizer_> #info assumptions that any node in cluster should be able to respond to a request instead of bouncing it around
16:11:33 <tbachman> gwu: do you have a link for the slides, by chance?
16:11:39 * tbachman apologizes if this has been asked already
16:11:58 <moizer_> #info important that rpc request needs to be routed to the node with the connection
16:12:29 <tbachman> #info Other considerations: scalability; should call home devices be “multi-homed” to multiple controller nodes
16:12:35 <moizer_> we can share it on the wiki
16:12:41 <tbachman> moizer_: thx!
16:13:08 <tbachman> #info moizer_ asks gwu if the idea is that the request to controller be bounced — is that so you don’t get a redirect?
16:13:12 <tbachman> #info gwu says yes
16:14:18 <tbachman> #info moizer_ says that the routed RPC mechanism should support this
16:15:48 <tbachman> #info uchau asks in the clustering model, what happens to an OF switch when taht node goes down; needs device ownership model so that the device can work with another node in the controller
16:16:07 <tbachman> #info gwu says when a node goes down, the device needs to reconnect with one of the other nodes
16:16:26 <tbachman> #info uchau asks if USC was going have openflow also go through the secure channel
16:16:28 <tbachman> #info gwu says yes
16:16:58 <tbachman> #info uchau is interested in developing a device ownership concept, which helps provide failover direction
16:17:32 <tbachman> #info uchau says in this case, if of connects directly or through secure channel, the ownership model is the same
16:17:43 <tbachman> #info gwu asks how openflow deals with multihoming/mastership?
16:17:58 <tbachman> #info uchau says the openflow team is implementing a message that allows the controller to assert the role
16:18:16 <tbachman> #info uchau says that it can look at the device ownership when a device connects, and assert the role
16:19:33 <tbachman> #info Helen says that clustering already has a supernode concept — asks if this is related
16:20:53 <tbachman> #info moizer_ says for data, there is a concept of leaders and followers, but that does not mean you can go to another node to access inventory
16:26:00 <tbachman> #info Helen asks that w/o a load balancer, is it possible for clustering to solve this problem
16:26:18 <tbachman> #info moizer_ recommends using virtual IPs for the controller
16:27:13 <tbachman> #info uchau says one option is to have the device connect to all the controllers in a team, which is similar to the openflow model
16:29:59 <moizer_> I guess a problem with virtual ip and load balancing is how do you do keep-alives
16:30:28 <tbachman> #info moizer_ says one problem with using a virtual IP and load balancing is how to do keep-alives
16:30:34 <tbachman> :)
16:30:58 <tbachman> #info gwu asks what the scalability is of that model — how many connections can a node handle
16:31:20 <tbachman> #info uchau says that jmedved was maybe targeting 5k, but wasn
16:31:22 <tbachman> #undo
16:31:31 <tbachman> ack
16:31:33 <tbachman> no chair ;)
16:31:41 <tbachman> : #info uchau says that jmedved was maybe targeting 5k, but wasn’t sure whether that was per-node or per-cluster
16:31:51 <tbachman> #info uchau says that jmedved was maybe targeting 5k, but wasn’t sure whether that was per-node or per-cluster
16:32:04 <tbachman> #info Helen says that their requirement is for 1 million devices
16:32:25 * tbachman pictures Dr. Evil
16:33:27 <tbachman> #info moizer_ says with clustering, we can only store that we can fit into memory (i.e. storage can’t exceed the amount of memory available)
16:33:31 <tbachman> #info moizer_ says that’s a lot of operational data
16:33:42 <tbachman> #info Helen says all the other data is stateless
16:34:02 <tbachman> #info moizer_ says 1 million devices, and suspects that’s a lot of data in memory
16:34:19 <markmozolewski> Devices could maintain 1 Master / 1-2 Slave (backup) connections and establish new slave connections as failover occurs (vs. maintaining connections to all slaves), for cluster sizes >> 3.
16:34:49 <tbachman> #info Fabiel Zuniga says that the persistence service may be able to help here
16:35:04 <tbachman> #info markmozolewski says devices could maintain 1 Master / 1-2 Slave (backup) connections and establish new slave connections as failover occurs (vs. maintaining connections to all slaves), for cluster sizes >> 3.
16:36:09 <tbachman> #info moizer_ recommends connecting a bunch of devices and see how things perform
16:37:32 <tbachman> #info uchau asks if Helen wants the controller to support the load balancing, or using external load balancers
16:37:57 <tbachman> #info uchau guesses that the 1 million nodes is to be supported by the cluster, not by a single node in the cluster
16:39:18 <tbachman> #info moizer_ says with 64 switches in openflow, it takes about 4-1/2 MB in the data store
16:39:38 <moizer_> #info I need to talk about bugs/patches for 10 mins
16:40:08 <tbachman> #info catohornet asks with timeouts in the cluster — sees issue with many nodes, and where they’re configured topologically
16:40:39 <tbachman> #info moizer_ says you don’t need to have every node fully replicated; as an example, with routing logic and 5 cluster nodes, you might choose to do replication on only 3 of the nodes
16:40:39 <markmozolewski> Controlling replication factors will be key for that scale.
16:42:16 <tbachman> #info gwu asks if the proposal is workable
16:42:18 <tbachman> #info moizer_ says yes
16:42:50 <tbachman> #info gwu was thinking of presenting statistics to the MD-SAL (e.g. bytes transferred); asks about this (e.g. effects on data store as things scale)
16:43:26 <tbachman> #info moizer_ says if stats colllection interval isn’t too low, then it should be okay (e.g. no client will be reading stats every 3 seconds)
16:44:33 <tbachman> #topic bug review
16:44:36 <tbachman> darn
16:44:43 <tbachman> have to be chair for that one, too
17:11:16 <tbachman> moizer_: fyi — 10 past the hr
17:15:33 <tbachman> moizer_: 15 past the hr and I need to run - don’t forget to do the #endmeeting
17:15:46 <moizer_> thx tbachman
17:15:53 <tbachman> np!
17:31:31 <markmozolewski> we'll create the bug for Akka upgrade and Vamsi will take the lead investigating not just stability with latest version but also performance of new version vs. old version (using Jan's current test scripts in a cluster)
17:58:15 <moizer_> #endmeeting