#opendaylight-meeting: md-sal interest

Meeting started by colindixon at 17:02:06 UTC (full logs).

Meeting summary

  1. agenda bashing (colindixon, 17:02:12)
    1. https://wiki.opendaylight.org/view/MD-SAL_Weekly_Call#Agenda the agenda (colindixon, 17:03:34)
    2. Mouli will share some performance evaluation results and solicit feedback from the community. (colindixon, 17:03:50)

  2. Helium MD-SAL Data Store & OF Plugin Performance Analysis (colindixon, 17:06:29)
    1. slides will be posted after the meeting (there is a webex recording) (colindixon, 17:06:55)
    2. the goal is to characterize the performance of the MD-SAL DOM Data Store and OF plugin (as well as scalability and reliability) (colindixon, 17:08:41)
    3. used a switch simulator to measure flow rate (flows/second) and flow scaling (max flows) with a ~20 switches (colindixon, 17:09:28)
    4. used a Dell server with 64GB RAM, 8 cores and NWSim (colindixon, 17:09:58)
    5. used modified OpenFlow Plugin test provider (multi-threaded, batching flows, etc.) (colindixon, 17:10:52)
    6. tested DOM datastore alone (no OFplugin), DOM datastore alone (no OFplugin, no notifications), OF drop test (skip DOM datastore), DOM datatstore + OF plugin, and DOM datastore (no notifications) + OF Plugin (colindixon, 17:13:04)
    7. this was informed by finding bottlenecks, e.g., they found that notifications were expensive (colindixon, 17:13:58)

  3. observations so far (colindixon, 17:15:15)
    1. originally were seeing <100 flows/second from the MD-SAL data store + OpenFlow plugin (colindixon, 17:15:42)
    2. potential bottlenecks here: (colindixon, 17:16:18)
    3. * data change notifications on WriteTxCommit processing (colindixon, 17:16:56)
    4. * single-threaded commit processing compounds this problem (colindixon, 17:17:10)
    5. * the merge operation processing overheads were much higher than for put (colindixon, 17:17:29)
    6. removing the use of data change notifications in the process resulted in ~5000 flows/second (colindixon, 17:20:01)
    7. notifications are two operations (first creating listener trees and then difference computation), the overhead seems to be in the second (colindixon, 17:20:45)
    8. this ~5000 flows/second was with the OFplugin, but with no clustering (at least not yet) (colindixon, 17:21:48)
    9. they do note that the existing MD-SAL microbenchmark on a laptop-grade system hit 10s of thousands of operations per second (colindixon, 17:24:48)
    10. there were three key differences: (1) using normalized nodes instead of binding aware interfaces, (2) using notifications, (3) using a very simple model instead of the more complex flows (colindixon, 17:25:44)
    11. rovarga seems to think that (1) would cause a lot of overhead (colindixon, 17:26:08)

  4. possible fixes (colindixon, 17:29:06)
    1. tony says one issue could be that there is a translation from MD-SAL internal data change events to the one defined in the API (colindixon, 17:30:10)
    2. rovarga thinks the right approach is to migrate the difference computation to the client, not the data store (colindixon, 17:34:51)
    3. the logic is that this would allow the client to do just the computation it needs, not all the way down to the leaves (colindixon, 17:37:37)
    4. rovarga argues that getting the subscription API implemented in way that allows for this kind of optimization would be difficult (colindixon, 17:40:32)
    5. colindixon says he thought that the triggering scope did this, e.g., subtree vs. current node vs. current node + children (colindixon, 17:41:05)
    6. rovarga says that this is *just* triggering scope, not the scope of the changes that are provided (colindixon, 17:41:35)
    7. the problem is that in order to know whether the scope has triggered, we need to perform full comparison. there is no way to say 'this is granular enough' (rovarga, 17:42:28)
    8. and once we know it triggered, we need to also calculate all the nodes which changed, as we do not know what the user will ask for (rovarga, 17:46:15)
    9. a navigable tree of what has changed would solve this -- the app can navigate, find it out, and ask for DTOs which it is interested in (rovarga, 17:47:12)
    10. Uyen says that the current apps seem to be mostly based on data changes, and it appears (from this discussion) that this might not scale well (colindixon, 17:51:02)
    11. given that Uyen asks what the performance guidelines for using the MD-SAL are (colindixon, 17:51:37)
    12. rovarga and muthu answer that “it depends on your application” (colindixon, 17:52:59)
    13. colindixon restates the question “given that you want to have complex models with high update rates, is the answer don’t use data change notifications?” (colindixon, 17:53:38)
    14. rovarga says yes, but there may be APIs that allow for better scoped notifications and thus less pain here (colindixon, 17:54:37)
    15. in general: batch as much as possible, listen to the minimal set you need, perform put() [which pushes you to single-writer-per-subtree] (rovarga, 17:55:18)
    16. try to match what the produces put() and what the consumers trigger on (rovarga, 17:57:51)

  5. wrap-up (colindixon, 17:57:51)
    1. ACTION: muthu will send out data (profiler and performance numbers) (colindixon, 17:58:15)
    2. ACTION: Mouli will send mail about possible missing/dangling flows (colindixon, 17:58:41)
    3. ACTION: Muthu/Mouli to post slides to wiki (colindixon, 17:58:55)
    4. ACTION: the community needs to understand the right patterns to use the MD-SAL to get decent performance (colindixon, 18:00:36)


Meeting ended at 18:02:00 UTC (full logs).

Action items

  1. muthu will send out data (profiler and performance numbers)
  2. Mouli will send mail about possible missing/dangling flows
  3. Muthu/Mouli to post slides to wiki
  4. the community needs to understand the right patterns to use the MD-SAL to get decent performance


People present (lines said)

  1. colindixon (46)
  2. odl_meetbot (5)
  3. rovarga (5)
  4. abhijitkumbhare (0)


Generated by MeetBot 0.1.4.