#opnfv-fds log

16:00:50 <frankbrockners> #startmeeting FDS synch
16:00:50 <collabot> Meeting started Thu Oct 20 16:00:50 2016 UTC.  The chair is frankbrockners. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:50 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:50 <collabot> The meeting name has been set to 'fds_synch'
16:01:01 <frankbrockners> #info Frank Brockners
16:01:09 <frankbrockners> could you please info in?
16:01:19 <jlinkes> #info Juraj Linkes
16:01:20 <tomas_c> #info Tomas Cechvala
16:01:20 <raymondmaika> #info Raymond Maika
16:01:37 <marcello_sestak> info in
16:01:54 <frankbrockners> #info agenda for today: https://wiki.opnfv.org/display/meetings/FastDataStacks#FastDataStacks-Thursday,October20,2016
16:03:03 <frankbrockners> let's focus on the two key areas of issues right now (a) HA deployment status on CENGN POD and Cisco FDS POD (b) QEMU vhost user reconnect and security groups
16:03:40 <marcello_sestak> #info MArcel Sestak
16:03:53 <andy_vanko> #info Andrej Vanko
16:03:57 <vlavor|alt> #info Vlado Lavor
16:04:02 <frankbrockners> jlinkes, marcello_sestak, raymondmaika - could you give an update on the HA deployments? Are we able to run functest and submit results?
16:05:02 <marcello_sestak> I unfortunatelly not able to repeat the deploym,ent on super micro lab with the modified settings in the yaml file, the same is true for FDS pod, no succes
16:05:58 <raymondmaika> I am able to deploy on SuperMicro, but MySQL is having trouble staying up after the deployment though. Checked with nofeature-ha last night and it didn't have the same issues. MySQL going down causes other services to fail, so functest will fail.
16:06:26 <marcello_sestak> yesterday on depoloyed  enviro by Raymond i did a func tests, they failed after the healthcheck passed ok
16:07:27 <frankbrockners> raymondmaika - do we know what caused the MySQL issues?
16:07:43 <raymondmaika> trozet - mentioned mysql cluster breaking could be because of some network issues. Waiting to hear from him on steps, since nofeature-ha deployment didn't have the same issue.
16:07:49 <frankbrockners> raymondmaika - on the deploy - do we have connectivity across all nodes for admin and tenant networks?
16:08:42 <raymondmaika> we do, admin network seems to be fine, I can access all overcloud nodes and the can communicate. VMs can also ping each other over tenant networks
16:09:24 <raymondmaika> functest may be okay if we could get the services not to flap, which Tim had identified being due to mysql cluster problems.
16:09:25 <frankbrockners> interesting ... - do we know what network issues trozet was referring to?
16:10:01 <raymondmaika> I think intermittent disconnects, but I haven't seen any signs to indicate that happening
16:10:30 <trozet> raymondmaika: that sucks :/  that means its probably sql dying on the nodes randomly
16:10:53 <trozet> raymondmaika: some side effect of hugepages I think, need to look into it
16:11:25 <raymondmaika> trozet: I will take a closer look at mysql logs while the issues occur when the redeploy that's going on now is finished
16:11:25 <frankbrockners> raymondmaika - is there a setup that the behavior can be observed on right now?
16:11:38 <raymondmaika> frankbrockners: it's re-deploying with that setup now.
16:11:38 <frankbrockners> ah ok
16:11:52 <raymondmaika> relatively close to completion
16:11:57 <frankbrockners> jlinkes - do we see similar behavior on the Cisco FDS POD?
16:12:22 <jlinkes> I only had a little bit of time with the pod today, but basically yes
16:12:33 <jlinkes> I tried to create a network after it deployed
16:12:56 <jlinkes> network creation passed, but subnet creation request returned 503 and then the services started to flap
16:12:57 <frankbrockners> did you also see the mysql issues jlinkes?
16:13:14 <frankbrockners> 503 means what?
16:13:26 <raymondmaika> server not available. means services behind haproxy are down
16:13:28 <jlinkes> it looked like the same issue, I didn't investigate further
16:13:37 <frankbrockners> thanks jlinkes
16:13:45 <raymondmaika> sounds the same, yeah
16:13:49 <trozet> its because the galera (sql cluster) goes down, so when that goes down, all the python openstakc processes die
16:14:05 <trozet> so you get 503 cause ha proxy has no where to route the wsgi request
16:14:37 <frankbrockners> if raymondmaika and trozet could look into the issue later today that would be great. could you send a status via email - we could pick up in our morning hours
16:14:42 <frankbrockners> I'm still hoping that we can get the odl_l2-fdio-ha scenario into Colorado 2.0
16:14:50 <trozet> i just dont know why it dies, and wanted to rule out the network
16:15:07 <raymondmaika> frankbrockners: will look into sql logs see if I can see anything useful
16:15:17 <frankbrockners> thanks raymondmaika
16:15:19 <trozet> raymondmaika: also look for CRM in /var/log/messages
16:15:43 <trozet> frankbrockners: i will try a virtual HA deployment on my setup today and see if I hit it
16:15:55 <frankbrockners> thanks trozet
16:16:16 <frankbrockners> let's move to the second large "problem domain" - qemu - vhostuser - secgroups
16:16:34 <jlinkes> #info regarding qemu issue - I tried to build qemu with Shesha's suggestion, but there are other options that need to be turned on when building qemu
16:16:34 <jlinkes> #info I've managed to resolve some of them and I'm facing this issue: http://pastebin.com/a877bf20
16:16:34 <jlinkes> #info Wojciech provided some useful comments which will most likely help resolve the issues
16:16:34 <jlinkes> #info also fpan pointed me to a build which contains the vhost reconnect feature (which Damjan mentioned and why we're trying all these different qemus) - 1348593 at https://cbs.centos.org/koji/buildinfo?buildID=12087
16:16:51 <jlinkes> #info roadmap - try fpan's qemu with both role configurations, if that doesn't work, continue with building 2.7.50
16:18:36 <frankbrockners> jlinkes - thanks - sounds like a good plan - i.e. try fpan's image first - and also make sure to include Damjan's VPP patch
16:19:35 <frankbrockners> Damjan's patch is https://git.fd.io/cgit/vpp/commit/?id=10eb1ea
16:20:22 <frankbrockners> tomas_c - did you have a chance - despite the lab availability and the qemu issue to test the security groups implementation
16:20:40 <tomas_c> frankbrockners: i synced with Juraj in lab
16:20:49 <tomas_c> considering what should be tested
16:20:57 <tomas_c> i'm currently finding gabs
16:21:05 <tomas_c> an we will continue with juraj tomorrow in lab
16:21:15 <tomas_c> with the full stack, but so frat it seems promissing
16:21:18 <tomas_c> data in HC look good
16:21:43 <tomas_c> *gaps
16:22:14 <tomas_c> we also discussed qemu issue, how to deal with it
16:22:18 <jlinkes> frankbrockners: that one's already merged, so we're fine with using the latest rpms - https://gerrit.fd.io/r/#/c/3390/
16:22:19 <frankbrockners> ok tomas_c - given the unresolved issues with qemu (resolution of which are a prerequisite for sec groups), let's prioritize HA over qemu/secgroups when it comes to lab testing
16:22:38 <frankbrockners> thanks jlinkes - just wanted to make sure we use the latest RPMs
16:22:51 <jlinkes> sure
16:23:13 <tomas_c> ok, will try to sync with guys on HA
16:23:21 <frankbrockners> thanks
16:24:02 <andy_vanko> tomas_c, jlinkes: could you guys try that today? so we have much more updates by tomorrow morning
16:24:14 <frankbrockners> let's move to the nosdn-fdio-noha scenario
16:24:29 <frankbrockners> seanatcisco, fpan - any news?
16:25:01 <seanatcisco> frankbrockners: not yet, Shriram is building the service profiles right now in the new testbed
16:25:06 <seanatcisco> will keep you posted
16:25:43 <frankbrockners> seanatcisco - thanks - if there are any news please send a brief line to fds-dev@lists.opnfv.org
16:25:54 <seanatcisco> frankbrockners: will do
16:25:56 <jlinkes> andy_vanko: I'll try out fpan's qemu today
16:26:44 <frankbrockners> alright... we covered the things that I had on the agenda for today - Maros also sent updates for HC / SNAT implementation
16:26:56 <frankbrockners> is there anything else to cover - andy_vanko?
16:27:27 <andy_vanko> frankbrockners: no frank, thanks. the rest will be handled offline :)
16:27:52 <frankbrockners> ok ... still keeping fingers crossed that we can at least get the HA scenario into C2.0
16:27:56 <frankbrockners> thanks everyone!
16:28:04 <frankbrockners> ... and good luck!
16:28:07 <frankbrockners> #endmeeting