16:00:50 #startmeeting FDS synch 16:00:50 Meeting started Thu Oct 20 16:00:50 2016 UTC. The chair is frankbrockners. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:50 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:50 The meeting name has been set to 'fds_synch' 16:01:01 #info Frank Brockners 16:01:09 could you please info in? 16:01:19 #info Juraj Linkes 16:01:20 #info Tomas Cechvala 16:01:20 #info Raymond Maika 16:01:37 info in 16:01:54 #info agenda for today: https://wiki.opnfv.org/display/meetings/FastDataStacks#FastDataStacks-Thursday,October20,2016 16:03:03 let's focus on the two key areas of issues right now (a) HA deployment status on CENGN POD and Cisco FDS POD (b) QEMU vhost user reconnect and security groups 16:03:40 #info MArcel Sestak 16:03:53 #info Andrej Vanko 16:03:57 #info Vlado Lavor 16:04:02 jlinkes, marcello_sestak, raymondmaika - could you give an update on the HA deployments? Are we able to run functest and submit results? 16:05:02 I unfortunatelly not able to repeat the deploym,ent on super micro lab with the modified settings in the yaml file, the same is true for FDS pod, no succes 16:05:58 I am able to deploy on SuperMicro, but MySQL is having trouble staying up after the deployment though. Checked with nofeature-ha last night and it didn't have the same issues. MySQL going down causes other services to fail, so functest will fail. 16:06:26 yesterday on depoloyed enviro by Raymond i did a func tests, they failed after the healthcheck passed ok 16:07:27 raymondmaika - do we know what caused the MySQL issues? 16:07:43 trozet - mentioned mysql cluster breaking could be because of some network issues. Waiting to hear from him on steps, since nofeature-ha deployment didn't have the same issue. 16:07:49 raymondmaika - on the deploy - do we have connectivity across all nodes for admin and tenant networks? 16:08:42 we do, admin network seems to be fine, I can access all overcloud nodes and the can communicate. VMs can also ping each other over tenant networks 16:09:24 functest may be okay if we could get the services not to flap, which Tim had identified being due to mysql cluster problems. 16:09:25 interesting ... - do we know what network issues trozet was referring to? 16:10:01 I think intermittent disconnects, but I haven't seen any signs to indicate that happening 16:10:30 raymondmaika: that sucks :/ that means its probably sql dying on the nodes randomly 16:10:53 raymondmaika: some side effect of hugepages I think, need to look into it 16:11:25 trozet: I will take a closer look at mysql logs while the issues occur when the redeploy that's going on now is finished 16:11:25 raymondmaika - is there a setup that the behavior can be observed on right now? 16:11:38 frankbrockners: it's re-deploying with that setup now. 16:11:38 ah ok 16:11:52 relatively close to completion 16:11:57 jlinkes - do we see similar behavior on the Cisco FDS POD? 16:12:22 I only had a little bit of time with the pod today, but basically yes 16:12:33 I tried to create a network after it deployed 16:12:56 network creation passed, but subnet creation request returned 503 and then the services started to flap 16:12:57 did you also see the mysql issues jlinkes? 16:13:14 503 means what? 16:13:26 server not available. means services behind haproxy are down 16:13:28 it looked like the same issue, I didn't investigate further 16:13:37 thanks jlinkes 16:13:45 sounds the same, yeah 16:13:49 its because the galera (sql cluster) goes down, so when that goes down, all the python openstakc processes die 16:14:05 so you get 503 cause ha proxy has no where to route the wsgi request 16:14:37 if raymondmaika and trozet could look into the issue later today that would be great. could you send a status via email - we could pick up in our morning hours 16:14:42 I'm still hoping that we can get the odl_l2-fdio-ha scenario into Colorado 2.0 16:14:50 i just dont know why it dies, and wanted to rule out the network 16:15:07 frankbrockners: will look into sql logs see if I can see anything useful 16:15:17 thanks raymondmaika 16:15:19 raymondmaika: also look for CRM in /var/log/messages 16:15:43 frankbrockners: i will try a virtual HA deployment on my setup today and see if I hit it 16:15:55 thanks trozet 16:16:16 let's move to the second large "problem domain" - qemu - vhostuser - secgroups 16:16:34 #info regarding qemu issue - I tried to build qemu with Shesha's suggestion, but there are other options that need to be turned on when building qemu 16:16:34 #info I've managed to resolve some of them and I'm facing this issue: http://pastebin.com/a877bf20 16:16:34 #info Wojciech provided some useful comments which will most likely help resolve the issues 16:16:34 #info also fpan pointed me to a build which contains the vhost reconnect feature (which Damjan mentioned and why we're trying all these different qemus) - 1348593 at https://cbs.centos.org/koji/buildinfo?buildID=12087 16:16:51 #info roadmap - try fpan's qemu with both role configurations, if that doesn't work, continue with building 2.7.50 16:18:36 jlinkes - thanks - sounds like a good plan - i.e. try fpan's image first - and also make sure to include Damjan's VPP patch 16:19:35 Damjan's patch is https://git.fd.io/cgit/vpp/commit/?id=10eb1ea 16:20:22 tomas_c - did you have a chance - despite the lab availability and the qemu issue to test the security groups implementation 16:20:40 frankbrockners: i synced with Juraj in lab 16:20:49 considering what should be tested 16:20:57 i'm currently finding gabs 16:21:05 an we will continue with juraj tomorrow in lab 16:21:15 with the full stack, but so frat it seems promissing 16:21:18 data in HC look good 16:21:43 *gaps 16:22:14 we also discussed qemu issue, how to deal with it 16:22:18 frankbrockners: that one's already merged, so we're fine with using the latest rpms - https://gerrit.fd.io/r/#/c/3390/ 16:22:19 ok tomas_c - given the unresolved issues with qemu (resolution of which are a prerequisite for sec groups), let's prioritize HA over qemu/secgroups when it comes to lab testing 16:22:38 thanks jlinkes - just wanted to make sure we use the latest RPMs 16:22:51 sure 16:23:13 ok, will try to sync with guys on HA 16:23:21 thanks 16:24:02 tomas_c, jlinkes: could you guys try that today? so we have much more updates by tomorrow morning 16:24:14 let's move to the nosdn-fdio-noha scenario 16:24:29 seanatcisco, fpan - any news? 16:25:01 frankbrockners: not yet, Shriram is building the service profiles right now in the new testbed 16:25:06 will keep you posted 16:25:43 seanatcisco - thanks - if there are any news please send a brief line to fds-dev@lists.opnfv.org 16:25:54 frankbrockners: will do 16:25:56 andy_vanko: I'll try out fpan's qemu today 16:26:44 alright... we covered the things that I had on the agenda for today - Maros also sent updates for HC / SNAT implementation 16:26:56 is there anything else to cover - andy_vanko? 16:27:27 frankbrockners: no frank, thanks. the rest will be handled offline :) 16:27:52 ok ... still keeping fingers crossed that we can at least get the HA scenario into C2.0 16:27:56 thanks everyone! 16:28:04 ... and good luck! 16:28:07 #endmeeting