18:15:03 #startmeeting cperf 18:15:03 Meeting started Thu Jun 15 18:15:03 2017 UTC. The chair is dfarrell07. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:15:03 Useful Commands: #action #agreed #help #info #idea #link #topic. 18:15:03 The meeting name has been set to 'cperf' 18:18:33 #info mattw4 reports they they have reached some nice stability with setup and take down of scale deployment 18:18:54 #info very exciting that they have this deployment working 18:19:00 #info have scaled up to 100 compute nodes sof ar 18:19:17 #info creating lots of instances at the same time results in half of them failing 18:19:38 #info this is similar to what Nikos saw with nstat, they had to add in batches 18:20:08 #info many test dimensions in this matrix, need to make prios 18:20:29 #info next prios is getting this pushed to the upstream stuff they have started 18:20:48 #info Jamo suggests some initial tests, single host on each compute node, make sure everyone can talk to everyone 18:21:13 #info jamo talks about bugs they see in regular netvirt with some flows not getting installed, some instances not being able to talk to each other 18:21:37 #info Raghu confirms they are looking at that test early 18:23:33 #info mattw4 talks about timing between instance creation being important, waiting some number of seconds, or better yet waiting for a ping smoke test to pass before creating more instances 18:23:52 #info jamo talks about how openstack may have some issues like this, has heard something like 13 at a time 18:24:22 #info when spinning up lots of instances, some rules don't get installed, no tunels, openstack thinks things are good but they aren't because missing rules 18:24:34 #info jamo says make sure you're using latest carbon, bugs recently 18:24:44 #info mattw4 is on boron sr2, which is def a problem 18:24:59 #info jamoluhrsen says throw away boron asap, maybe sr4 but that's still otw 18:25:20 #info mattw4 is using andre's netvirt scale doc as test bible for now, following that closely 18:26:14 #info Raghu asks if there are custom configs we do for ODL that they should be doing, jamoluhrsen says no, nothing fancy 18:28:45 #info mattw4 talks about deleting instances not working so well, after removing instances ODL still maintaining state, mattw4 saw some crazy stuff with like 54GB of RAM used *wow faces all around* 18:29:43 #info LuisGomez says we should dump this ram into a profiler and see what's going on 18:30:07 #info dfarrell07 restates that this is def stuff we want to bring to odl devs, they will ask for tests to reproduce so need that as well 18:31:05 #info mattw4 is using openstack-ansible modules to work with openstack cluster, poke into and config things, have looked at rally but maybe not what they need, looking for feedback about tools to work with such things 18:32:07 #info LuisGomez and mattw4 talk about deployment tooling 18:32:28 #info mattw4 talks about using ansible, probing openstack api 18:32:45 #info next week they will have test plan to share with us, see where they are going 18:33:09 #info LuisGomez talks about this just running in their internal lab, everyone agrees will be awesome to get running in cperf etc 18:33:42 #info jamoluhrsen gives updates about migration of pod 18:33:56 #info we have new ip address, but they are not changes on boxes for us 18:34:06 #info we have console access, but need to get into boxes and change them 18:34:21 #info jamoluhrsen doesn't have cycles to do this atm, someone else could help, he would show how 18:36:56 #info jamoluhrsen talks about how cperf tools container has been useful in downstream testing, maybe we should advert to ppl as any easy way to get robot running 18:37:09 #info discussion about docker support in ODL CI, we have jobs that use it jamoluhrsen says 18:37:25 #info LuisGomez wants more things running in containers (of course) 18:38:04 #info mattw4 talks about their attempt to use docker networks as underlay, they were not so great, ended up using linux bridge and veth pairs via scripts, pure linux 18:39:17 #info LuisGomez reports he hopes to have some time for switch scale tests in openflow cluster next week 18:39:25 #info LuisGomez has some perf tests already, but not scale in cluster 18:40:23 #info there was a bug in the scale test before, controller was getting switch connections but was not pushing table miss flows, LuisGomez has worked on test very recently, seeing more stable results 18:40:38 #info LuisGomez is seeing 400 switches in this non cluster tests 18:40:59 #info LuisGomez says this bug has maybe been fixed very recently, new patch, need to revert test changes and check new odl patch 18:42:35 #info dfarrell07 highlights issues from opnfv vsperf folks we raised to ODL CI and openflow devs around switches being lost, heartbeat messages not being prio causing more problems 18:43:41 #info there was some recent patch to carbon+ about fixing reconnects, but deeper problem is prio of connections 18:47:45 #info discussion about running odl tests on opnfv infra, that we're not sure of status of that effort, dfarrell07 guesses that as we do lf collab stuff this will become more easy/focus 18:48:01 #info LuisGomez talks about sanity tests, dashboard, what is relevant/important 18:48:18 #info LuisGomez talks about dashboard work interns are doing, that we have some cool gui stuff he will demo next week 18:49:32 #info LuisGomez talks about elasticsearch, can just push json with any body, then have logic to parse that and populate db, then graphing tools on top of that to make graphs 18:50:43 #info can used this for other things later, infra, jenkins, whatever we can measure can go into dashboard 18:51:10 #endmeeting