#opnfv-cperf: cperf

Meeting started by dfarrell07 at 18:15:03 UTC (full logs).

Meeting summary

    1. mattw4 reports they they have reached some nice stability with setup and take down of scale deployment (dfarrell07, 18:18:33)
    2. very exciting that they have this deployment working (dfarrell07, 18:18:54)
    3. have scaled up to 100 compute nodes sof ar (dfarrell07, 18:19:00)
    4. creating lots of instances at the same time results in half of them failing (dfarrell07, 18:19:17)
    5. this is similar to what Nikos saw with nstat, they had to add in batches (dfarrell07, 18:19:38)
    6. many test dimensions in this matrix, need to make prios (dfarrell07, 18:20:08)
    7. next prios is getting this pushed to the upstream stuff they have started (dfarrell07, 18:20:29)
    8. Jamo suggests some initial tests, single host on each compute node, make sure everyone can talk to everyone (dfarrell07, 18:20:48)
    9. jamo talks about bugs they see in regular netvirt with some flows not getting installed, some instances not being able to talk to each other (dfarrell07, 18:21:13)
    10. Raghu confirms they are looking at that test early (dfarrell07, 18:21:37)
    11. mattw4 talks about timing between instance creation being important, waiting some number of seconds, or better yet waiting for a ping smoke test to pass before creating more instances (dfarrell07, 18:23:33)
    12. jamo talks about how openstack may have some issues like this, has heard something like 13 at a time (dfarrell07, 18:23:52)
    13. when spinning up lots of instances, some rules don't get installed, no tunels, openstack thinks things are good but they aren't because missing rules (dfarrell07, 18:24:22)
    14. jamo says make sure you're using latest carbon, bugs recently (dfarrell07, 18:24:34)
    15. mattw4 is on boron sr2, which is def a problem (dfarrell07, 18:24:44)
    16. jamoluhrsen says throw away boron asap, maybe sr4 but that's still otw (dfarrell07, 18:24:59)
    17. mattw4 is using andre's netvirt scale doc as test bible for now, following that closely (dfarrell07, 18:25:20)
    18. Raghu asks if there are custom configs we do for ODL that they should be doing, jamoluhrsen says no, nothing fancy (dfarrell07, 18:26:14)
    19. mattw4 talks about deleting instances not working so well, after removing instances ODL still maintaining state, mattw4 saw some crazy stuff with like 54GB of RAM used *wow faces all around* (dfarrell07, 18:28:45)
    20. LuisGomez says we should dump this ram into a profiler and see what's going on (dfarrell07, 18:29:43)
    21. dfarrell07 restates that this is def stuff we want to bring to odl devs, they will ask for tests to reproduce so need that as well (dfarrell07, 18:30:07)
    22. mattw4 is using openstack-ansible modules to work with openstack cluster, poke into and config things, have looked at rally but maybe not what they need, looking for feedback about tools to work with such things (dfarrell07, 18:31:05)
    23. LuisGomez and mattw4 talk about deployment tooling (dfarrell07, 18:32:07)
    24. mattw4 talks about using ansible, probing openstack api (dfarrell07, 18:32:28)
    25. next week they will have test plan to share with us, see where they are going (dfarrell07, 18:32:45)
    26. LuisGomez talks about this just running in their internal lab, everyone agrees will be awesome to get running in cperf etc (dfarrell07, 18:33:09)
    27. jamoluhrsen gives updates about migration of pod (dfarrell07, 18:33:42)
    28. we have new ip address, but they are not changes on boxes for us (dfarrell07, 18:33:56)
    29. we have console access, but need to get into boxes and change them (dfarrell07, 18:34:06)
    30. jamoluhrsen doesn't have cycles to do this atm, someone else could help, he would show how (dfarrell07, 18:34:21)
    31. jamoluhrsen talks about how cperf tools container has been useful in downstream testing, maybe we should advert to ppl as any easy way to get robot running (dfarrell07, 18:36:56)
    32. discussion about docker support in ODL CI, we have jobs that use it jamoluhrsen says (dfarrell07, 18:37:09)
    33. LuisGomez wants more things running in containers (of course) (dfarrell07, 18:37:25)
    34. mattw4 talks about their attempt to use docker networks as underlay, they were not so great, ended up using linux bridge and veth pairs via scripts, pure linux (dfarrell07, 18:38:04)
    35. LuisGomez reports he hopes to have some time for switch scale tests in openflow cluster next week (dfarrell07, 18:39:17)
    36. LuisGomez has some perf tests already, but not scale in cluster (dfarrell07, 18:39:25)
    37. there was a bug in the scale test before, controller was getting switch connections but was not pushing table miss flows, LuisGomez has worked on test very recently, seeing more stable results (dfarrell07, 18:40:23)
    38. LuisGomez is seeing 400 switches in this non cluster tests (dfarrell07, 18:40:38)
    39. LuisGomez says this bug has maybe been fixed very recently, new patch, need to revert test changes and check new odl patch (dfarrell07, 18:40:59)
    40. dfarrell07 highlights issues from opnfv vsperf folks we raised to ODL CI and openflow devs around switches being lost, heartbeat messages not being prio causing more problems (dfarrell07, 18:42:35)
    41. there was some recent patch to carbon+ about fixing reconnects, but deeper problem is prio of connections (dfarrell07, 18:43:41)
    42. discussion about running odl tests on opnfv infra, that we're not sure of status of that effort, dfarrell07 guesses that as we do lf collab stuff this will become more easy/focus (dfarrell07, 18:47:45)
    43. LuisGomez talks about sanity tests, dashboard, what is relevant/important (dfarrell07, 18:48:01)
    44. LuisGomez talks about dashboard work interns are doing, that we have some cool gui stuff he will demo next week (dfarrell07, 18:48:18)
    45. LuisGomez talks about elasticsearch, can just push json with any body, then have logic to parse that and populate db, then graphing tools on top of that to make graphs (dfarrell07, 18:49:32)
    46. can used this for other things later, infra, jenkins, whatever we can measure can go into dashboard (dfarrell07, 18:50:43)


Meeting ended at 18:51:10 UTC (full logs).

Action items

  1. (none)


People present (lines said)

  1. dfarrell07 (48)
  2. collabot (3)


Generated by MeetBot 0.1.4.