#opnfv-cperf: cperf
Meeting started by dfarrell07 at 18:15:03 UTC
(full logs).
Meeting summary
-
- mattw4 reports they they have reached some nice
stability with setup and take down of scale deployment (dfarrell07,
18:18:33)
- very exciting that they have this deployment
working (dfarrell07,
18:18:54)
- have scaled up to 100 compute nodes sof
ar (dfarrell07,
18:19:00)
- creating lots of instances at the same time
results in half of them failing (dfarrell07,
18:19:17)
- this is similar to what Nikos saw with nstat,
they had to add in batches (dfarrell07,
18:19:38)
- many test dimensions in this matrix, need to
make prios (dfarrell07,
18:20:08)
- next prios is getting this pushed to the
upstream stuff they have started (dfarrell07,
18:20:29)
- Jamo suggests some initial tests, single host
on each compute node, make sure everyone can talk to everyone
(dfarrell07,
18:20:48)
- jamo talks about bugs they see in regular
netvirt with some flows not getting installed, some instances not
being able to talk to each other (dfarrell07,
18:21:13)
- Raghu confirms they are looking at that test
early (dfarrell07,
18:21:37)
- mattw4 talks about timing between instance
creation being important, waiting some number of seconds, or better
yet waiting for a ping smoke test to pass before creating more
instances (dfarrell07,
18:23:33)
- jamo talks about how openstack may have some
issues like this, has heard something like 13 at a time (dfarrell07,
18:23:52)
- when spinning up lots of instances, some rules
don't get installed, no tunels, openstack thinks things are good but
they aren't because missing rules (dfarrell07,
18:24:22)
- jamo says make sure you're using latest carbon,
bugs recently (dfarrell07,
18:24:34)
- mattw4 is on boron sr2, which is def a
problem (dfarrell07,
18:24:44)
- jamoluhrsen says throw away boron asap, maybe
sr4 but that's still otw (dfarrell07,
18:24:59)
- mattw4 is using andre's netvirt scale doc as
test bible for now, following that closely (dfarrell07,
18:25:20)
- Raghu asks if there are custom configs we do
for ODL that they should be doing, jamoluhrsen says no, nothing
fancy (dfarrell07,
18:26:14)
- mattw4 talks about deleting instances not
working so well, after removing instances ODL still maintaining
state, mattw4 saw some crazy stuff with like 54GB of RAM used *wow
faces all around* (dfarrell07,
18:28:45)
- LuisGomez says we should dump this ram into a
profiler and see what's going on (dfarrell07,
18:29:43)
- dfarrell07 restates that this is def stuff we
want to bring to odl devs, they will ask for tests to reproduce so
need that as well (dfarrell07,
18:30:07)
- mattw4 is using openstack-ansible modules to
work with openstack cluster, poke into and config things, have
looked at rally but maybe not what they need, looking for feedback
about tools to work with such things (dfarrell07,
18:31:05)
- LuisGomez and mattw4 talk about deployment
tooling (dfarrell07,
18:32:07)
- mattw4 talks about using ansible, probing
openstack api (dfarrell07,
18:32:28)
- next week they will have test plan to share
with us, see where they are going (dfarrell07,
18:32:45)
- LuisGomez talks about this just running in
their internal lab, everyone agrees will be awesome to get running
in cperf etc (dfarrell07,
18:33:09)
- jamoluhrsen gives updates about migration of
pod (dfarrell07,
18:33:42)
- we have new ip address, but they are not
changes on boxes for us (dfarrell07,
18:33:56)
- we have console access, but need to get into
boxes and change them (dfarrell07,
18:34:06)
- jamoluhrsen doesn't have cycles to do this atm,
someone else could help, he would show how (dfarrell07,
18:34:21)
- jamoluhrsen talks about how cperf tools
container has been useful in downstream testing, maybe we should
advert to ppl as any easy way to get robot running (dfarrell07,
18:36:56)
- discussion about docker support in ODL CI, we
have jobs that use it jamoluhrsen says (dfarrell07,
18:37:09)
- LuisGomez wants more things running in
containers (of course) (dfarrell07,
18:37:25)
- mattw4 talks about their attempt to use docker
networks as underlay, they were not so great, ended up using linux
bridge and veth pairs via scripts, pure linux (dfarrell07,
18:38:04)
- LuisGomez reports he hopes to have some time
for switch scale tests in openflow cluster next week (dfarrell07,
18:39:17)
- LuisGomez has some perf tests already, but not
scale in cluster (dfarrell07,
18:39:25)
- there was a bug in the scale test before,
controller was getting switch connections but was not pushing table
miss flows, LuisGomez has worked on test very recently, seeing more
stable results (dfarrell07,
18:40:23)
- LuisGomez is seeing 400 switches in this non
cluster tests (dfarrell07,
18:40:38)
- LuisGomez says this bug has maybe been fixed
very recently, new patch, need to revert test changes and check new
odl patch (dfarrell07,
18:40:59)
- dfarrell07 highlights issues from opnfv vsperf
folks we raised to ODL CI and openflow devs around switches being
lost, heartbeat messages not being prio causing more problems
(dfarrell07,
18:42:35)
- there was some recent patch to carbon+ about
fixing reconnects, but deeper problem is prio of connections
(dfarrell07,
18:43:41)
- discussion about running odl tests on opnfv
infra, that we're not sure of status of that effort, dfarrell07
guesses that as we do lf collab stuff this will become more
easy/focus (dfarrell07,
18:47:45)
- LuisGomez talks about sanity tests, dashboard,
what is relevant/important (dfarrell07,
18:48:01)
- LuisGomez talks about dashboard work interns
are doing, that we have some cool gui stuff he will demo next
week (dfarrell07,
18:48:18)
- LuisGomez talks about elasticsearch, can just
push json with any body, then have logic to parse that and populate
db, then graphing tools on top of that to make graphs (dfarrell07,
18:49:32)
- can used this for other things later, infra,
jenkins, whatever we can measure can go into dashboard (dfarrell07,
18:50:43)
Meeting ended at 18:51:10 UTC
(full logs).
Action items
- (none)
People present (lines said)
- dfarrell07 (48)
- collabot (3)
Generated by MeetBot 0.1.4.