16:00:50 <frankbrockners> #startmeeting FDS synch 16:00:50 <collabot> Meeting started Thu Oct 20 16:00:50 2016 UTC. The chair is frankbrockners. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:50 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:50 <collabot> The meeting name has been set to 'fds_synch' 16:01:01 <frankbrockners> #info Frank Brockners 16:01:09 <frankbrockners> could you please info in? 16:01:19 <jlinkes> #info Juraj Linkes 16:01:20 <tomas_c> #info Tomas Cechvala 16:01:20 <raymondmaika> #info Raymond Maika 16:01:37 <marcello_sestak> info in 16:01:54 <frankbrockners> #info agenda for today: https://wiki.opnfv.org/display/meetings/FastDataStacks#FastDataStacks-Thursday,October20,2016 16:03:03 <frankbrockners> let's focus on the two key areas of issues right now (a) HA deployment status on CENGN POD and Cisco FDS POD (b) QEMU vhost user reconnect and security groups 16:03:40 <marcello_sestak> #info MArcel Sestak 16:03:53 <andy_vanko> #info Andrej Vanko 16:03:57 <vlavor|alt> #info Vlado Lavor 16:04:02 <frankbrockners> jlinkes, marcello_sestak, raymondmaika - could you give an update on the HA deployments? Are we able to run functest and submit results? 16:05:02 <marcello_sestak> I unfortunatelly not able to repeat the deploym,ent on super micro lab with the modified settings in the yaml file, the same is true for FDS pod, no succes 16:05:58 <raymondmaika> I am able to deploy on SuperMicro, but MySQL is having trouble staying up after the deployment though. Checked with nofeature-ha last night and it didn't have the same issues. MySQL going down causes other services to fail, so functest will fail. 16:06:26 <marcello_sestak> yesterday on depoloyed enviro by Raymond i did a func tests, they failed after the healthcheck passed ok 16:07:27 <frankbrockners> raymondmaika - do we know what caused the MySQL issues? 16:07:43 <raymondmaika> trozet - mentioned mysql cluster breaking could be because of some network issues. Waiting to hear from him on steps, since nofeature-ha deployment didn't have the same issue. 16:07:49 <frankbrockners> raymondmaika - on the deploy - do we have connectivity across all nodes for admin and tenant networks? 16:08:42 <raymondmaika> we do, admin network seems to be fine, I can access all overcloud nodes and the can communicate. VMs can also ping each other over tenant networks 16:09:24 <raymondmaika> functest may be okay if we could get the services not to flap, which Tim had identified being due to mysql cluster problems. 16:09:25 <frankbrockners> interesting ... - do we know what network issues trozet was referring to? 16:10:01 <raymondmaika> I think intermittent disconnects, but I haven't seen any signs to indicate that happening 16:10:30 <trozet> raymondmaika: that sucks :/ that means its probably sql dying on the nodes randomly 16:10:53 <trozet> raymondmaika: some side effect of hugepages I think, need to look into it 16:11:25 <raymondmaika> trozet: I will take a closer look at mysql logs while the issues occur when the redeploy that's going on now is finished 16:11:25 <frankbrockners> raymondmaika - is there a setup that the behavior can be observed on right now? 16:11:38 <raymondmaika> frankbrockners: it's re-deploying with that setup now. 16:11:38 <frankbrockners> ah ok 16:11:52 <raymondmaika> relatively close to completion 16:11:57 <frankbrockners> jlinkes - do we see similar behavior on the Cisco FDS POD? 16:12:22 <jlinkes> I only had a little bit of time with the pod today, but basically yes 16:12:33 <jlinkes> I tried to create a network after it deployed 16:12:56 <jlinkes> network creation passed, but subnet creation request returned 503 and then the services started to flap 16:12:57 <frankbrockners> did you also see the mysql issues jlinkes? 16:13:14 <frankbrockners> 503 means what? 16:13:26 <raymondmaika> server not available. means services behind haproxy are down 16:13:28 <jlinkes> it looked like the same issue, I didn't investigate further 16:13:37 <frankbrockners> thanks jlinkes 16:13:45 <raymondmaika> sounds the same, yeah 16:13:49 <trozet> its because the galera (sql cluster) goes down, so when that goes down, all the python openstakc processes die 16:14:05 <trozet> so you get 503 cause ha proxy has no where to route the wsgi request 16:14:37 <frankbrockners> if raymondmaika and trozet could look into the issue later today that would be great. could you send a status via email - we could pick up in our morning hours 16:14:42 <frankbrockners> I'm still hoping that we can get the odl_l2-fdio-ha scenario into Colorado 2.0 16:14:50 <trozet> i just dont know why it dies, and wanted to rule out the network 16:15:07 <raymondmaika> frankbrockners: will look into sql logs see if I can see anything useful 16:15:17 <frankbrockners> thanks raymondmaika 16:15:19 <trozet> raymondmaika: also look for CRM in /var/log/messages 16:15:43 <trozet> frankbrockners: i will try a virtual HA deployment on my setup today and see if I hit it 16:15:55 <frankbrockners> thanks trozet 16:16:16 <frankbrockners> let's move to the second large "problem domain" - qemu - vhostuser - secgroups 16:16:34 <jlinkes> #info regarding qemu issue - I tried to build qemu with Shesha's suggestion, but there are other options that need to be turned on when building qemu 16:16:34 <jlinkes> #info I've managed to resolve some of them and I'm facing this issue: http://pastebin.com/a877bf20 16:16:34 <jlinkes> #info Wojciech provided some useful comments which will most likely help resolve the issues 16:16:34 <jlinkes> #info also fpan pointed me to a build which contains the vhost reconnect feature (which Damjan mentioned and why we're trying all these different qemus) - 1348593 at https://cbs.centos.org/koji/buildinfo?buildID=12087 16:16:51 <jlinkes> #info roadmap - try fpan's qemu with both role configurations, if that doesn't work, continue with building 2.7.50 16:18:36 <frankbrockners> jlinkes - thanks - sounds like a good plan - i.e. try fpan's image first - and also make sure to include Damjan's VPP patch 16:19:35 <frankbrockners> Damjan's patch is https://git.fd.io/cgit/vpp/commit/?id=10eb1ea 16:20:22 <frankbrockners> tomas_c - did you have a chance - despite the lab availability and the qemu issue to test the security groups implementation 16:20:40 <tomas_c> frankbrockners: i synced with Juraj in lab 16:20:49 <tomas_c> considering what should be tested 16:20:57 <tomas_c> i'm currently finding gabs 16:21:05 <tomas_c> an we will continue with juraj tomorrow in lab 16:21:15 <tomas_c> with the full stack, but so frat it seems promissing 16:21:18 <tomas_c> data in HC look good 16:21:43 <tomas_c> *gaps 16:22:14 <tomas_c> we also discussed qemu issue, how to deal with it 16:22:18 <jlinkes> frankbrockners: that one's already merged, so we're fine with using the latest rpms - https://gerrit.fd.io/r/#/c/3390/ 16:22:19 <frankbrockners> ok tomas_c - given the unresolved issues with qemu (resolution of which are a prerequisite for sec groups), let's prioritize HA over qemu/secgroups when it comes to lab testing 16:22:38 <frankbrockners> thanks jlinkes - just wanted to make sure we use the latest RPMs 16:22:51 <jlinkes> sure 16:23:13 <tomas_c> ok, will try to sync with guys on HA 16:23:21 <frankbrockners> thanks 16:24:02 <andy_vanko> tomas_c, jlinkes: could you guys try that today? so we have much more updates by tomorrow morning 16:24:14 <frankbrockners> let's move to the nosdn-fdio-noha scenario 16:24:29 <frankbrockners> seanatcisco, fpan - any news? 16:25:01 <seanatcisco> frankbrockners: not yet, Shriram is building the service profiles right now in the new testbed 16:25:06 <seanatcisco> will keep you posted 16:25:43 <frankbrockners> seanatcisco - thanks - if there are any news please send a brief line to fds-dev@lists.opnfv.org 16:25:54 <seanatcisco> frankbrockners: will do 16:25:56 <jlinkes> andy_vanko: I'll try out fpan's qemu today 16:26:44 <frankbrockners> alright... we covered the things that I had on the agenda for today - Maros also sent updates for HC / SNAT implementation 16:26:56 <frankbrockners> is there anything else to cover - andy_vanko? 16:27:27 <andy_vanko> frankbrockners: no frank, thanks. the rest will be handled offline :) 16:27:52 <frankbrockners> ok ... still keeping fingers crossed that we can at least get the HA scenario into C2.0 16:27:56 <frankbrockners> thanks everyone! 16:28:04 <frankbrockners> ... and good luck! 16:28:07 <frankbrockners> #endmeeting