#opendaylight-clustering: clustering_hackers

Meeting started by moizer_ at 15:59:08 UTC (full logs).

Meeting summary

    1. Gary Wu presenting information on Unified Secure Channel (moizer_, 16:06:55)
    2. wants to support call home like netconf call home (moizer_, 16:07:26)
    3. device needs to make an inbound call to controller (moizer_, 16:07:39)
    4. device creates a call home connection (moizer_, 16:08:23)
    5. this allows controller to talk to device (moizer_, 16:08:57)
    6. assumptions that any node in cluster should be able to respond to a request instead of bouncing it around (moizer_, 16:11:11)
    7. important that rpc request needs to be routed to the node with the connection (moizer_, 16:11:58)
    8. Other considerations: scalability; should call home devices be “multi-homed” to multiple controller nodes (tbachman, 16:12:29)
    9. moizer_ asks gwu if the idea is that the request to controller be bounced — is that so you don’t get a redirect? (tbachman, 16:13:08)
    10. gwu says yes (tbachman, 16:13:12)
    11. moizer_ says that the routed RPC mechanism should support this (tbachman, 16:14:18)
    12. uchau asks in the clustering model, what happens to an OF switch when taht node goes down; needs device ownership model so that the device can work with another node in the controller (tbachman, 16:15:48)
    13. gwu says when a node goes down, the device needs to reconnect with one of the other nodes (tbachman, 16:16:07)
    14. uchau asks if USC was going have openflow also go through the secure channel (tbachman, 16:16:26)
    15. gwu says yes (tbachman, 16:16:28)
    16. uchau is interested in developing a device ownership concept, which helps provide failover direction (tbachman, 16:16:58)
    17. uchau says in this case, if of connects directly or through secure channel, the ownership model is the same (tbachman, 16:17:32)
    18. gwu asks how openflow deals with multihoming/mastership? (tbachman, 16:17:43)
    19. uchau says the openflow team is implementing a message that allows the controller to assert the role (tbachman, 16:17:58)
    20. uchau says that it can look at the device ownership when a device connects, and assert the role (tbachman, 16:18:16)
    21. Helen says that clustering already has a supernode concept — asks if this is related (tbachman, 16:19:33)
    22. moizer_ says for data, there is a concept of leaders and followers, but that does not mean you can go to another node to access inventory (tbachman, 16:20:53)
    23. Helen asks that w/o a load balancer, is it possible for clustering to solve this problem (tbachman, 16:26:00)
    24. moizer_ recommends using virtual IPs for the controller (tbachman, 16:26:18)
    25. uchau says one option is to have the device connect to all the controllers in a team, which is similar to the openflow model (tbachman, 16:27:13)
    26. moizer_ says one problem with using a virtual IP and load balancing is how to do keep-alives (tbachman, 16:30:28)
    27. gwu asks what the scalability is of that model — how many connections can a node handle (tbachman, 16:30:58)
    28. uchau says that jmedved was maybe targeting 5k, but wasn (tbachman, 16:31:20)
    29. uchau says that jmedved was maybe targeting 5k, but wasn’t sure whether that was per-node or per-cluster (tbachman, 16:31:51)
    30. Helen says that their requirement is for 1 million devices (tbachman, 16:32:04)
    31. moizer_ says with clustering, we can only store that we can fit into memory (i.e. storage can’t exceed the amount of memory available) (tbachman, 16:33:27)
    32. moizer_ says that’s a lot of operational data (tbachman, 16:33:31)
    33. Helen says all the other data is stateless (tbachman, 16:33:42)
    34. moizer_ says 1 million devices, and suspects that’s a lot of data in memory (tbachman, 16:34:02)
    35. Fabiel Zuniga says that the persistence service may be able to help here (tbachman, 16:34:49)
    36. markmozolewski says devices could maintain 1 Master / 1-2 Slave (backup) connections and establish new slave connections as failover occurs (vs. maintaining connections to all slaves), for cluster sizes >> 3. (tbachman, 16:35:04)
    37. moizer_ recommends connecting a bunch of devices and see how things perform (tbachman, 16:36:09)
    38. uchau asks if Helen wants the controller to support the load balancing, or using external load balancers (tbachman, 16:37:32)
    39. uchau guesses that the 1 million nodes is to be supported by the cluster, not by a single node in the cluster (tbachman, 16:37:57)
    40. moizer_ says with 64 switches in openflow, it takes about 4-1/2 MB in the data store (tbachman, 16:39:18)
    41. I need to talk about bugs/patches for 10 mins (moizer_, 16:39:38)
    42. catohornet asks with timeouts in the cluster — sees issue with many nodes, and where they’re configured topologically (tbachman, 16:40:08)
    43. moizer_ says you don’t need to have every node fully replicated; as an example, with routing logic and 5 cluster nodes, you might choose to do replication on only 3 of the nodes (tbachman, 16:40:39)
    44. gwu asks if the proposal is workable (tbachman, 16:42:16)
    45. moizer_ says yes (tbachman, 16:42:18)
    46. gwu was thinking of presenting statistics to the MD-SAL (e.g. bytes transferred); asks about this (e.g. effects on data store as things scale) (tbachman, 16:42:50)
    47. moizer_ says if stats colllection interval isn’t too low, then it should be okay (e.g. no client will be reading stats every 3 seconds) (tbachman, 16:43:26)


Meeting ended at 17:58:15 UTC (full logs).

Action items

  1. (none)


People present (lines said)

  1. tbachman (54)
  2. moizer_ (13)
  3. odl_meetbot (3)
  4. markmozolewski (3)


Generated by MeetBot 0.1.4.