16:00:16 #startmeeting FDS synch 16:00:16 Meeting started Thu Sep 1 16:00:16 2016 UTC. The chair is frankbrockners. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:16 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:16 The meeting name has been set to 'fds_synch' 16:00:20 #info Juraj Linkes 16:00:41 #info Frank Brockners 16:01:09 #info draft agenda - https://wiki.opnfv.org/display/meetings/FastDataStacks#FastDataStacks-Thursday,September1,2016 16:01:15 #info Michal Cmarada 16:02:02 let's get rolling... - let's touch on the open issues first. Michal made good progress. 16:03:14 michal-cmarada|2 - quick update on the qr tap to BD? 16:04:14 seems that it is working properly. Port is added to the BD and VMs can reach it. 16:04:20 #info Michal submitted https://git.opendaylight.org/gerrit/#/c/44935/; https://git.opendaylight.org/gerrit/#/c/45000/ (for boron and carbon) which are expected to resolve the qrouter tap to BD issue 16:04:39 #info Michal validated things locally: Port is added to the BD and VMs can reach it. 16:05:18 michal-cmarada|2 - can we get the patch merged? 16:05:48 patch from boron is failing in jenkins https://git.opendaylight.org/gerrit/#/c/45000/. we did a recheck but many jenkins checks are failing today. 16:06:01 :-( 16:06:22 Qrouter tap port is also working on UCS-B side 1 16:07:39 edwarnicke - are you there? 16:08:01 edwarnicke - do you know of issues with verify jobs on Boron right now? 16:08:02 frankbrockners: Yes 16:08:14 I did not 16:08:15 Catching up 16:08:27 edwarnicke - https://git.opendaylight.org/gerrit/#/c/44935 verifies fine 16:08:39 but the cherry pick to boron fails verify 16:09:02 https://git.opendaylight.org/gerrit/#/c/45000/ is critical to us - it fixes the last technical issue that FDS has 16:09:28 frankbrockners: I just poked the right folks on #opendaylight-releng 16:09:33 Should hear something shortly 16:09:38 thanks edwarnicke 16:10:07 let's see what else is on the laundry list... (a) functest (b) hugepages 16:10:26 Just tested my patch on UCS-B side 1 once again. setup is correct, going to test the pings. 16:10:27 trozet mentioned that deploy works - but functest fails 16:11:44 #info jlinkes told me earlier the day that functest fails because of 2 reasons (a) qrouter - to - BD patch not there yet (b) VMs need to be started with hugepages option 16:12:19 #info morgan explained details how to amend functest config yaml to enable hugepages option 16:12:26 jlinkes - anything to add? 16:12:53 nothing to add 16:12:57 thanks jlinkes 16:13:18 frankbrockners: good point 16:13:23 frankbrockners: about hugepages I mean 16:13:31 let's move to hugepages... 16:14:02 fact is that we need hugepages configured 16:14:10 the real question is where and how... 16:14:56 by default this is done in sysctl.d/80-vpp.conf for vpp 16:15:05 see also damians email 16:15:17 trozet - you didn't like that approach too much - correct? 16:15:39 frankbrockners: no 16:16:20 frankbrockners: although I'm willing to go with it 16:16:31 trozet: any rational / background? 16:16:31 frankbrockners: we would need to update the puppet module to configure that conf file 16:17:00 frankbrockners: I don't think it's a good idea for VPP to override sysctl settings when you install it's RPM 16:17:18 frankbrockners: i think vpp documentation should say hey, you need to set hugepages, here is how to do it 16:17:32 frankbrockners: got to remember this is a linux host, and vpp is one application on it 16:17:38 trozet - but from what I understand, without the 80-vpp.conf, hugepages aren't configured correctly 16:18:00 frankbrockners: no, we configure hugepages for the host, the 80-vpp.conf is overridding our grub settings 16:18:29 frankbrockers: damian 16:18:44 hmmm... jlinkes - did we see hugepages configured properly without things being set in 80-vpp.conf? 16:18:52 i did 16:18:53 frankbrockers: damian's e-mail didn't contain much in terms of information 16:19:04 jlinkes - agreed 16:19:15 sysctl -w vm.nr_hugepages=10000 16:19:17 sysctl -w vm.max_map_count=25000 16:19:18 frankbrockners: in my email you can see messages from the kernel, saying it is set to 2048 (the value apex set it to). After that, vpp.conf changes it 16:19:19 sysctl -w vm.hugetlb_shm_group=0 16:19:21 sysctl -w kernel.shmmax=20971520000 16:19:22 i have set this directly in 16:19:33 frankbrockners: yes, we just set it using sysctl -w vm.nr_hugepages=10000 16:19:34 sysctl -w vm.max_map_count=25000 16:19:34 sysctl -w vm.hugetlb_shm_group=0 16:19:34 sysctl -w kernel.shmmax=20971520000 16:19:40 frankbrockners: and that worked fine 16:19:46 sorry wrong order 16:19:47 but what if we don't put anything into vpp.conf? 16:20:03 frankbrockners: and we left 80-vpp.conf alone 16:20:14 frankbrockners: vpp.conf will only be read when you reboot 16:20:22 frankbrockners: so if jlinkes reboots his machine, it will fallback to 1024 16:20:29 right 16:20:55 or you can do sysctl --system 16:20:58 and it will reload the values from the conf file 16:21:04 frankbrockners: that seems like it could work, deleting everything from 80-vpp.conf and leaving it empty 16:21:23 jlinkes - let's try that 16:21:37 "you should not reboot" isn't too much of an option... 16:21:50 frankbrockners: what I'm saying is 16:22:01 frankbrockners: the whole 80-vpp.conf, should be removed from VPP RPM install 16:22:14 frankbrockners: not deleted post install, or anything 16:22:41 frankbrockners: noted, will try that tomorrow 16:22:42 frankbrockners: it is much better to document the settings that need to be changed, and let the user decide how he wants to do it 16:23:13 trozet: that's a debate to have with vpp folks 16:23:19 trozet - understand - but this would mean that we need an RPM specific for OPNFV 16:23:43 frankbrockners: no i htink it means modifying the VPP RPM for all users :) 16:24:02 vpp folks would like to keep vpp.conf in the rpm to make sure things work.. - edwarnicke might have a view from a VPP perspective 16:24:03 the problem with damian's e-mail is that even though we got "answers" we didn't learn anything and we're in the same position as if he didn't respond at all 16:24:08 frankbrockners: but like jlinkes said, that will be a bigget debate/take longer 16:24:41 trozet - so what would you suggest as interim solution? 16:24:42 he just stated their position and provided no explanation for anything 16:24:45 trozet: vpp rpms need to work out of the box 16:24:59 trozet: Without that, they don't 16:25:13 edwarnicke: they dont work without a sysctl reload anyway 16:25:27 trozet: Package install does the sysctl reload :) 16:25:32 edwarnicke: so you might as well provide instructions to the user on how to set his hugepage settings, then let him do it 16:25:49 frankbrockners: the interim solution is, Apex can either A) remove that file before deployment 16:25:59 or B) we can add more puppet conf to puppet-fdio and configure it properly 16:26:12 B will take longer than A 16:26:15 trozet: Its done by a post install script 16:26:34 frankbrockners: The ODL releng folks are aware of the issue and are fixing it that is blocking our verify 16:26:53 trozet - how about we do A) for now to get the deployment going 16:27:03 edwarnicke - many thanks 16:27:03 edwarnicke: what does it mean the need to work out of the box and how is it tied to that 80-vpp.conf file? 16:27:12 edwarnicke: sure. I still don't think it's the best approach 16:27:33 trozet: I'm open to alternatives that allow vpp to actually work out of the box on package install :) 16:28:19 edwarnicke: why not just put in your install docs, to set these settings before using VPP, and let the user decide how many hugepages, etc 16:28:30 edwarnicke: then a user decides for his host, what is appropriate 16:28:32 trozet: Install docs != works out of the box 16:28:41 trozet: When someone installs a package, it should run not crash 16:29:44 edwarnicke, trozet - unlikely that we solve things here.. - how about we do trozet's option A) for now? 16:30:46 frankbrockners: we will need to also add some of those other options that vpp.conf is setting, if they are required 16:31:14 edwarnicke: if htey read the instructions first, it wont crash and work out of hte box :) 16:31:36 trozet: Who do you know who reads the instrutions when they type 'yum install foo' 16:31:40 Nobody I know does that 16:31:51 edwarnicke: i need my lawn mower to work out of the box, but i didnt read the instrucitons about filling it up with gas.. 16:31:56 :) 16:32:14 LOL 16:32:26 hehe 16:32:56 frankbrockners: I'll work on removing the file and adding the argumetns to our deploy settings file 16:33:12 frankbrockners: but the end result should be, to tweak these parameters for a deployment, do it in the apex deploy settings file 16:33:25 thanks trozet - so basically move the contents of vpp.conf to your own config 16:33:25 edwarnicke: how does this actually work? you install vpp, it then sets the stuff in 80-vpp and then vpp starts? 16:33:47 trozet: Just to make sure I understand, the issue for you with 80-vpp.conf is that you need to add a different file there with different hugepages requirements, correct? 16:34:08 jlinkes: Yes. You install vpp and then service vpp start just works 16:34:11 edwarnicke: no. The problem is 80-vpp.conf is there, so it overrides hte hugepages that the kernel is booted with 16:34:30 edwarnicke: the vpp virus attacks our host and changes its hugepages from 2048 to 1024 16:34:51 edwarnicke .. and 1024 is too low... 16:34:56 trozet: I see that as a variation of the same theme, you are just putting the hugepages in the kernel arguments instead of the sysclt directory.. but net net, its overriding a different choice you've made 16:35:06 edwarnicke: right 16:35:17 edwarnicke: we want to give hte option to the user in Apex to declare how many hugepages before he deploys 16:35:25 edwarnicke: so its a setting, and we set those as kernel args 16:35:30 edwarnicke: my main question was whether vpp configured hugepages as part of installation 16:35:35 configures 16:35:38 trozet: Totally valid 16:35:44 jlinkes: It does 16:36:04 jlinkes: It runs sysctl -system in its post install script 16:36:21 frankbrockners, edwarnicke: so if you see here https://gerrit.opnfv.org/gerrit/gitweb?p=apex.git;a=blob;f=config/deploy/os-odl_l2-fdio-noha.yaml;h=ad54fbdc830ecf9790c8d0f2b4104192e41214d2;hb=HEAD 16:36:21 trozet: Let me think about this a bit 16:36:28 trozet: Because there may be a comprimise 16:36:35 frankbrockners, edwarnicke: you see kernel arguments a user can change there for an apex deployment 16:36:36 edwarnicke: so if there are other applications using hugepages vpp installation could totally screw hugepages for them? 16:36:59 trozet: How would you feel about this: iff the vpp package discovers 1024 or more hugepages,it leaves well enough alone, otherwise, it sets them 16:37:14 edwarnicke: that makes sense 16:37:29 trozet: Which brings me to my question... how do I find out that hugepages is being set via kernel params? 16:37:29 * frankbrockners likes that solution 16:37:48 edwarnicke: but what about the other settings, I thought I saw some of the comments in that vpp.conf file relate to the number of hugepages set 16:37:59 trozet: here 16:37:59 trozet: The proposed solution is basically to go from: make sure hugepages == 1024, to make sure hugepages >= 1024 16:38:12 edwarnicke: you check the bootloader or /proc/meminfo 16:38:15 trozet: That I'd have to look at 16:38:16 sysctl -w vm.nr_hugepages=10000 16:38:16 sysctl -w vm.max_map_count=22000 16:38:16 sysctl -w vm.hugetlb_shm_group=0 16:38:16 sysctl -w kernel.shmmax=20971520000 16:38:24 trozet: this is what we use 16:38:52 proc/meminfo doesn't help me, because it only tells me current runtime, not next boottime 16:39:09 trozet: the formulas are vm.max_map_count = 2 * vm.nr_hugepage + 10% for vpp 16:39:13 edwarnicke: oh hmm 16:39:53 edwarnicke: yeah I'm not sure how to do that 16:40:10 edwarnicke: we may even instlal VPP before we set the hugepages for next boot 16:40:27 trozet: Yeah, that I can't help you with :( 16:40:28 trozet: and kernel.shmmax = 2 * vm.nr_hugepages * 1024 * 1024 for 2MB hugepages 16:40:42 jlinkes: ok let me see about setting those in apex kernel args 16:40:44 trozet: I do potentially have an idea though 16:40:56 trozet: Is there a standard for naming a config file to set hugepages? 16:41:03 If there is... 16:41:08 We may have a good approach 16:41:10 Hmm... 16:41:18 but can't we start with don't touch hugepages unless current-hugepages < 1024 16:41:42 i still like the idea of documenting in requirements :) 16:41:56 if a user starts VPP without hugepages 16:42:03 print an error in journalctl 16:42:08 and say go configure huge pages 16:42:08 we'll add this to the docs for sure 16:43:29 frankbrockners: for hte interim, i'll remove the file before we deploy and add the config to the kernel args 16:45:18 trozet: there's also a requirement from woj 16:45:28 jlinkes: what's that? 16:45:45 trozet: it would be best if apex could figure out the maximum possible number of hugepages and configure that number 16:46:07 trozet: well, not really a requirement 16:46:31 trozet - thanks - makes sense 16:47:03 trozet: a suggestion for how to do this 16:47:15 jlinkes: yeah that would be good 16:47:18 trozet: I think he mentioned it one of the e-mails 16:47:38 jlinkes: TripleO is capable of doing "introspection" which means find out all the info about the hardware you are going to deploy to - before you deploy 16:47:54 jlinkes: we currently have that disabled, but with that info, we coudl modify the hugepages on the fly before we deploy 16:48:08 jlinkes: sounds like a good improvement for colorado2.0 16:48:19 jlinkes: can you file a JIRA for that? 16:48:32 trozet: okay 16:50:12 are we done for today? 16:50:21 I think so 16:51:18 frankbrockners: whats with the jenkins 16:52:04 frankbrockners: did I missed something? :) 16:52:20 michal-cmarada|2 - what exactly do you refer to? 16:53:08 trozet has jenkins jobs up 16:53:19 frankbrockners: let me link 16:53:31 https://build.opnfv.org/ci/job/apex-deploy-baremetal-os-odl_l2-fdio-noha-colorado/ 16:53:35 what we're missing is functest - but this is what we discussed earlier 16:53:41 https://build.opnfv.org/ci/job/apex-deploy-baremetal-os-nosdn-fdio-noha-colorado/ 16:53:51 frankbrockners: I mean why it is failing. If you have found out something. 16:53:59 frankbrockners: nosdn fdio also passes, but I don't think fpan has everything done yet for that 16:54:17 michal-cmarada|2: i think frankbrockners said it is because functest doesnt create VMs with hugepages 16:54:35 frankbrockners: should i talk to jose about getting this fixed when it detects FDIO as a scenario? 16:55:03 trozet: I think we'll have to do this 16:55:17 trozet: and by we I really mean me :-) 16:55:42 jlinkes: so you will follow up with jose/morgan on that? 16:56:05 thanks jlinkes... and michal-cmarada|2 - deployment already succeeds: https://build.opnfv.org/ci/job/apex-deploy-baremetal-os-odl_l2-fdio-noha-colorado/6/console 16:56:09 trozet: we already talked to morgan today 16:56:14 trozet: so yes 16:56:18 jlinkes: cool 16:58:08 ... looks like we're done for today. Thanks everyone! 16:58:11 #endmeeting