17:04:10 <arturt> #startmeeting JOID weekly 17:04:11 <collabot> Meeting started Wed Feb 10 17:04:10 2016 UTC. The chair is arturt. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:04:11 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic. 17:04:11 <collabot> The meeting name has been set to 'joid_weekly' 17:04:18 <iben_> ahh 17:04:21 <iben_> there we go 17:05:05 <narindergupta> #info Narinder Gupta 17:05:09 <David_Orange> #info David Blaisonneau 17:05:33 <catbus1> #info Samantha Jian-Pielak 17:05:51 <arturt> #info Artur Tyloch 17:06:27 <arturt> #chair iben_ 17:06:27 <collabot> Current chairs: arturt iben_ 17:06:46 <arturt> #link https://etherpad.opnfv.org/p/joid agenda 17:08:52 <iben_> ionutbalutoiu: has been helping us with this: http://webchat.freenode.net/?channels=opnfv-meeting 17:09:07 <iben_> #undo 17:09:07 <collabot> Removing item from minutes: <MeetBot.ircmeeting.items.Link object at 0x1baa890> 17:09:08 <iben_> woops - wrong link 17:09:32 <iben_> #info ionutbalutoiu has been helping with JOID testing https://github.com/opnfv/opnfv-ravello-demo 17:10:20 <iben_> #topic adgenda bashing 17:10:24 <arturt> #topic new b release date 17:10:37 <arturt> First week of March 17:10:41 <iben_> #info see etherpad above 17:11:02 <iben_> #topic release B readiness 17:13:03 <iben_> #link https://wiki.opnfv.org/releases/brahmaputra/release_plan new release date feb 26 17:13:31 <akash> Can we add some sort of note about experience per ravello on ontrail and open-daylight? 17:14:20 <akash> *contrail 17:16:33 <iben_> akash: that is already on the adgenda 17:16:40 <akash> okay thanks 17:16:51 <iben_> #info TSC Voted to set the Brahmaputra “release deploy” date to Thursday, February 25 17:17:28 <arturt> #topic Steps to B release https://etherpad.opnfv.org/p/steps_to_brahmaputra 17:18:39 <arturt> #info successful deployment with ODL 17:18:59 <iben_> #info ONOS charm - bug in charm narinder sent email to "the team" 17:19:57 <iben_> #info NTP needs to be set for each environment - suggest to use the MAAS machine as ntp server 17:24:10 <arturt> #topic contrail charm 17:24:17 <arturt> #info Contrail charm - still failing on 2/10 with Liberty https://jira.opnfv.org/browse/JOID-25 17:24:38 <iben_> #info Juniper OpenContrail https://jira.opnfv.org/browse/JOID-25 work in progress 17:28:33 <arturt> #topic ONOS charm 17:28:58 <iben_> #info ONOS charm - bug in charm narinder sent email to "the team" 17:29:05 <arturt> #info not ready - Chinese new year team OoO 17:31:56 <iben_> #info ONOS charm is stored in github but setup synced to launchpad bazzar https://github.com/opennetworkinglab/onos 17:32:30 <arturt> #topic ODL charm 17:33:08 <arturt> #info all ODL test are passing on orange pod 17:33:21 <arturt> #info all (except 3 tests) ODL test are passing on orange pod 17:33:31 <iben_> #info new ODL charm might fix the IPvsix team issue around L2 L3 mode - need to test 17:35:20 <iben_> #link https://build.opnfv.org/ci/view/joid/ we reviewed the builds here 17:35:26 <arturt> #topic Documentation update 17:36:14 <arturt> #info doc: https://git.opnfv.org/cgit/joid/tree/docs/configguide 17:36:17 <iben_> #link https://gerrit.opnfv.org/gerrit/#/c/5487/ 17:36:59 <bryan_att> #info Bryan Sullivan 17:37:42 <iben_> #info you can see the progress of the doc and patches here: #link https://gerrit.opnfv.org/gerrit/#/q/project:joid 17:49:55 <iben_> #info discussion around workflow - how to use JOID to do parralel functional tests in the cloud then perform hardware based performance tests once functional tests have passed 17:51:34 <bryan_att> sorry for asking so many questions - but the config export part of this has an unclear value to me; the running of multiple parallel CI/CD jobs in the cloud is clearly useful, but I don't see how exporting a Ravello-deployed config helps me in my local lab, because I still need to deploy using the JuJu deploy commands... 17:55:34 <bryan_att> OK, I think I understand - if someone developed a bundle tweak in Ravello testing they can export the bundle so we can use it in a local JOID deploy. That part is clear if true. 17:56:17 <bryan_att> That's though just a juju export operation, right? And it doesn't need to include resource specifics e.g. MACs etc.\ 17:56:23 <arturt> bryan_att: yes, it is core juju feature 17:57:33 <bryan_att> #1 ravello feature for me is complete transparency on the JOID installer support for that environment, e.g. at most a flag that indicates the special power control etc needs to be used when deploying there. 17:58:22 <bryan_att> "feature for me" means the #1 priority to be addressed through the Ravello project, and upstreamed to OPNFV. 17:58:44 <bryan_att> We need to minimize the lifespan of a ravello fork 18:00:26 <iben_> bryan_att: yes - agreed to minimize (or eliminate) any forks 18:00:53 <iben_> ionutbalutoiu: and i have discussed and agreed to this 18:03:39 <bryan_att> As I mentioned, TOSCA is our target for NSD/VNFD etc ingestion, but for now I understand for Canonical that JuJu bundles etc are the medium, and an export function for them would be useful explain how to do and use. 18:10:11 <arturt> bryan_att: have you tried export function ? 18:10:25 <David_Orange> Sorry i have to go. Bye 18:10:28 <bryan_att> arturt: no, not yet 18:10:41 <arturt> #info juju bundle export import https://jujucharms.com/docs/stable/charms-bundles 18:12:09 <arturt> apart from service model you have also a machine specification, which allows you to set up specific machines and then to place units of your services on those machines however you wish. 18:12:14 <catbus1> Ravello export is probably different from Juju charm bundle export. 18:12:37 <arturt> is there any Ravello export? 18:12:48 <catbus1> that's the blueprint, right? 18:13:05 <arturt> but cannot export blueprint outside Ravello 18:13:15 <catbus1> ah, yeah 18:13:17 <arturt> you can replicate blueprint on Ravello 18:13:20 <arturt> ok 18:14:37 <bryan_att> one thing I would like to discuss - the JuJu deploy step takes very long and it's not clear how to know what's going on... any help there would be great. 18:15:28 <bryan_att> e.g. why is it taking so long, what happened when it times out (as it regularly does...), etc - how to debug 18:17:30 <catbus1> running "watch juju status --format tabular" on another terminal helps to see what's going on, which is started, installing packages, failed, etc. 18:18:17 <arturt> bryan_att: usually we can deploy whole OpenStack in approx. 20min... 18:19:11 <iben_> #info bug submitted for maas power type driver for ravello https://bugs.launchpad.net/maas/+bug/1544211 18:19:49 <catbus1> bryan_att: I can talk about the juju-deployer process: http://pastebin.ubuntu.com/15006924/ 18:22:03 <catbus1> where is the wiki page? 18:22:35 <iben_> akash: you can see the work bryan_att did with joid here https://wiki.opnfv.org/copper/academy/joid 18:23:09 <iben_> what troubleshooting steps can we take to observe the bottlenecks with JOID? 18:23:37 <narindergupta> catbus1: thats what user guide should include 18:23:52 <iben_> amt or IPMI to see teh console boot process 18:23:52 <arturt> #action schedule JOID community session with Bryan to discuss user experience with Juju akash 18:24:00 <iben_> centralized log collection 18:24:01 <catbus1> the troubshooting steps, we have some info on the configguide 18:24:16 <iben_> verify NTP settings too 18:24:17 <catbus1> installguide 18:24:49 <catbus1> narindergupta: agreed, talking with bryan_att will help with the user guide. 18:25:06 <narindergupta> catbus1: :) 18:25:54 <arturt> let's schedule a session this week, today or tomorrow - we can use it for user guide 18:25:58 <iben_> where in the log can we see a charm is being downloaded 18:26:22 <iben_> can we enable hash tag progress reports on the console? 18:26:33 <iben_> so we can tail -f the log to see progress? 18:28:14 <bryan_att> bryan.sullivan@att.com 18:30:48 <catbus1> arturt: I am available this week 18:30:55 <catbus1> today or tomorrow 18:31:27 <durschatz> mailto:dave.urschatz@cengn.ca please invite me to meeting with Bryan 18:31:39 <akash> can we set up for next wednesday? 18:31:44 <akash> my week is slammed 18:31:56 <durschatz> yes for me 18:31:58 <akash> i was just about to send an invite and block up to 2 hours 18:32:13 <akash> catbus1: ^? 18:32:22 <narindergupta> bryan_att: btw whats the issue r u facing this time? Can u run me juju status --format=tabular and send me output? 18:32:25 <catbus1> akash: that works for me too 18:32:46 <catbus1> it's not that I am anxious to work on the user guide. ;p 18:32:48 <bryan_att> I can send the juju download logs. But doing "sudo grep download *.log" on all logs in the bootstrap VM I see that downloads started at 16:19 and finished at 16:20 (more than an hour ago) so I don't think that's the issue. 18:33:38 <bryan_att> https://www.irccloud.com/pastebin/UCZQXRCe/it's%20getting%20there%2C%20but%20just%20*very*%20slowly... 18:36:01 <bryan_att> In the last 20 minutes the juju UI went from 2 relations complete to almost all of them. I noticed that it just timed out, so I think somehow that may be allowing the relations to complete...? 18:36:13 <bryan_att> https://www.irccloud.com/pastebin/IlJGKSpF/Here%20is%20the%20timeout%20notice. 18:36:32 <catbus1> bryan_att: the charm download starts in the beginning of joid deployment. 18:36:36 <catbus1> should be short 18:36:55 <bryan_att> it was - about 2 minutes tops 18:42:50 <narindergupta> bryan_att: this simply looks like a timeout of 2 hrs. May be little extra time needed in your environment. Or just wait deployment might fiinsh soon as timeout should not imapct any relation 18:43:59 <bryan_att> narindergupta: looks like I had the keystone error I reported earlier. Maybe that's hanging the process. I can try the "juju resolved keystone/0" workaround 18:45:07 <narindergupta> bryan_att: ok 18:45:07 <bryan_att> narindergupta: I don't see why my environment should have performance issues - these are Intel i7 machines with 16GB RAM connected to a 100MB ethernet switch... the controller has two network interfaces... 18:46:14 <narindergupta> bryan_att: not performance usually maas act as proxy cache 18:47:20 <narindergupta> bryan_att: definetely full logs will help to understand 18:47:37 <bryan_att> OK, should I just paste them here? 18:47:43 <bryan_att> and which ones? 18:49:03 <catbus1> bryan_att: you can paste them to pastebin.ubuntu.com, if not confidential 18:49:16 <narindergupta> ./deploy.sh logs starting from there so i can figure it out from time stamp 18:50:24 <catbus1> irccloud works too. 18:54:55 <bryan_att> Here it is - the ubuntu pastebin only accepts text. The logs are too big. https://usercontent.irccloud-cdn.com/file/ie3mffYE/160210_blsaws_bootstrap_logs.tar.gz 19:04:42 <bryan_att> I used the "juju resolved keystone/0" workaround and things got happier. But normally (2-3 times now) I have to enter the command twice to resolve all the issues (the keystone error seems to reappear?) https://www.irccloud.com/pastebin/NDu0Uygd/ 19:19:23 <catbus1> bryan_att: from the juju status output, keystone unit is ready. 19:20:03 <bryan_att> yes, but only after I entered the command "juju resolved keystone/0" - see the previous status I posted where keystone was in "error" 19:20:17 <catbus1> ah, use juju resolved --retry keystone/0 19:20:44 <catbus1> using only juju resolved only makes the status look good, it doesn't do anything. 19:21:03 <catbus1> with '--retry' it will rerun the hook where it failed. 19:21:16 <bryan_att> OK, thanks that's good to know 19:21:48 <catbus1> you may wonder why juju resolved exists, it's for killing units that are in error state. You can't kill units in error state, so get it in working state and kill it. 19:22:49 <catbus1> bryan_att: sometimes the issue will resolve by itself after the re run, but if the error appears again, you can juju ssh keystone/0, and sudo -i as root to look into /var/log/juju/unit-keystone-0.log 19:23:28 <catbus1> find out where the last error is about, manually fix it, go back to the jumphost and re run the juju resolved --retry 19:24:58 <bryan_att> Here are the errors from that log file https://www.irccloud.com/pastebin/7chcwf99/ 19:26:11 <catbus1> you need to look at the section above "2016-02-10 18:39:06 DEBUG juju.worker.uniter modes.go:31 [AGENT-STATUS] error: hook failed: "identity-service-relation-changed"" 19:27:04 <catbus1> can you copy and paste the section before this error message in the log? 19:27:26 <bryan_att> ok, hang on 19:31:01 <bryan_att> Here it is https://www.irccloud.com/pastebin/cLRXYQlF/ 19:32:33 <catbus1> it's too little info. 19:33:47 * catbus1-afk --> meeting 19:38:45 <bryan_att> When you get back - here is more https://www.irccloud.com/pastebin/XsKD1nML/ 19:38:56 * bryan_att afk-lunch 19:48:45 <narindergupta> bryan_att: this is same issue about admin roles as trying to create the Admin role and failed as admin already exist 19:49:15 <narindergupta> and this is issue with keystone as well 19:49:25 <bryan_att> ok, so a known issue? 19:49:38 <narindergupta> yeah 19:50:22 <narindergupta> in keystone service does not differentiate between admin and Admin 19:50:34 <narindergupta> while keystone client does 19:51:18 <narindergupta> differentate so send request to service and service failed to create that role and says duplicate 19:52:52 <narindergupta> bryan_att: final success run of deply, functest and yardstick https://build.opnfv.org/ci/view/joid/job/joid-os-odl_l2-nofeature-ha-orange-pod2-daily-master/ 20:00:51 <narindergupta> bryan_att: basically keystone can not create two roles admin and Admin 20:08:41 <bryan_att> narindergupta: is this a keystone issue, or a JOID issue? Does it affect the other installers? 20:11:21 <narindergupta> bryan_att: other installers uses admin as role by default and charms uses Admin so on request from functest team we changed in role to admin in service but looks like some services are also trying to create role Admin which failed in keystone because keystone does not differentiate between admin and Admin. Defeinitely a keystone bug but other installer may not encounter those as they uses admin for everything. 20:12:10 <narindergupta> bryan_att: here is the bug already reported https://bugs.launchpad.net/charms/+source/keystone/+bug/1512984 20:13:04 <narindergupta> bryan_att: comment no 10 I suspect that keystone sees 'admin' and 'Admin' as the same thing from a role name perspective; the problem is that the role created by default is currently all lowercase, whereas the role requested via swift is not - the code checks but is case sensitive.We should fix that, but the root cause of the lowercase role creation is bemusing - its default is 'Admin' in config, not 'admin' and that's used raw by the charm. 20:13:39 <bryan_att> narindergupta: so what's our workaround in the meantime - change back to using "Admin" in the charms? 20:14:00 <narindergupta> yes 20:14:01 <bryan_att> because this appears to be affecting successful deployment 20:15:08 <narindergupta> bryan_att: whenever relationship changes this occur i would prefer Admin in bundle. there is yaml file in ci/odl/juju-deployer 20:15:58 <bryan_att> ok, are you going to issue a patch to change it back? Just wondering how long the issue will remain for JOID. 20:16:00 <narindergupta> change admin to Admin at the end of file for all three deployments juno, kilo and liberty 20:16:06 <narindergupta> then run ./clean.dh 20:16:15 <narindergupta> ./clean.sh 20:16:20 <bryan_att> OK, I can do that. 20:16:57 <narindergupta> admin-role: admin 272 keystone-admin-role: admin 20:17:20 <narindergupta> those two option needs a change from admin to Admin or comment both 20:17:26 <narindergupta> as by default is Admin 20:18:45 <narindergupta> thrn run ./deply.sh -o liberty -s odl -t nonha -l attvirpod1 20:18:54 <narindergupta> will restart the deployment. 20:23:13 <narindergupta> bryan_att: at the end it seems we will be switching to admin by default most likely in charm release in 16.04 as per this bug. 21:27:59 <bryan_att> narindergupta: when you say "change admin to Admin at the end of file for all three deployments juno, kilo and liberty", which file am I changing? 21:28:56 <narindergupta> bryan_att: there are three files for odl https://gerrit.opnfv.org/gerrit/#/c/9697/1/ci/odl/juju-deployer/ovs-odl-nonha.yaml https://gerrit.opnfv.org/gerrit/#/c/9697/1/ci/odl/juju-deployer/ovs-odl-ha.yaml and https://gerrit.opnfv.org/gerrit/#/c/9697/1/ci/odl/juju-deployer/ovs-odl-tip.yaml 21:35:47 <bryan_att> ok, starting the redeploy now 21:36:06 <narindergupta> cool 23:50:49 <bryan_att> narindergupta: this time it went thru to the end, everything looks good so far. 70 minutes to deploy. 23:53:06 <narindergupta> bryan_att: cool yeah that issue we need to work on 23:55:13 <bryan_att> narindergupta: sometime I would like to learn how to set the services to be assigned specific IPs. everytime they get installed they have different addresses. in typical NFV deployments I think we will try to have everything consistent. 23:57:41 <bryan_att> I still have the keystone error though. I was advised (by catbus1) to use the command "juju resolved --retry keystone/0" to retry the hook from where it failed. 00:14:16 <narindergupta> bryan_att: thats strange we need to understand is it same error and is it occuring in your case? 00:14:57 <bryan_att> looks like the same error - I could not login to horizon until I entered the resolved command. 00:15:59 <narindergupta> bryan_att: but this time we used Admin right 00:16:03 <narindergupta> > 00:16:07 <bryan_att> The --retry flag does not appear to have resolved it though - catbus1 said that without the --retry flag, that resolved was just ignored the issues 00:17:18 <bryan_att> Yes, I used Admin for keystone (at the end of the file) 00:17:25 <bryan_att> https://www.irccloud.com/pastebin/qCfxWSmC/ 05:10:01 <narindergupta> bryan_att: will you retry commenting both the option in file. Also if you can send me bundles.yaml from joid/ci would be great? 08:15:51 <fdegir> narindergupta: how may I help you? 13:38:12 <narindergupta1> hi David_Orange good morninig 14:30:48 <narindergupta1> ashyoung: hi 14:31:09 <ashyoung> narindergupta1: hi 14:31:53 <narindergupta1> ashyoung: do you know anyonw who can change the onos charm? Looks like team is on chineese new year and deployments are failing 14:32:17 <ashyoung> narindergupta1: yes 14:32:19 <narindergupta1> i have a fix but until it goes into git repo i can not run the install successfuly 14:32:36 <ashyoung> narindergupta1: can you help me out and provide me some details on what's failing? 14:32:43 <ashyoung> Oh 14:32:57 <ashyoung> Do you just need your fix checked in? 14:33:48 <narindergupta1> ashyoung: i need the fixes to check in 14:34:00 <ashyoung> ok 14:34:05 <ashyoung> I can help with that 14:34:32 <narindergupta1> http://bazaar.launchpad.net/~opnfv-team/charms/trusty/onos-controller/fixspace/changes/12?start_revid=12 14:34:37 <narindergupta1> contains the changes i need 14:34:48 <narindergupta1> there are two changes 14:35:36 <ashyoung> Thanks! 14:35:52 <ashyoung> I will get it taken care of right away 14:36:01 <ashyoung> What's the current problem? 14:36:04 <narindergupta1> rev 11 and 12 needs to be added 14:36:20 <narindergupta1> deployment failed because of additonal space introduced in charm 14:36:37 <narindergupta1> nd also then onos-controller charm install failed failed fo config 14:36:49 <narindergupta1> and there are two patches for two issues 14:37:05 <ashyoung> got it 14:38:02 <narindergupta1> ashyoung: once you are able to merge in git tree then i can sync in bazaar and run the deployment again. 14:38:15 <ashyoung> understood 14:39:17 <ashyoung> I'll get it done 14:39:40 <narindergupta1> ashyoung: thanks 14:40:17 <ashyoung> My pleasure 16:05:59 <David_Orange> narindergupta: hi 16:07:18 <narindergupta1> David_Orange: hi ok resize test cases passed now. but need to check with you around the glance api failed cases? 16:07:45 <David_Orange> how can i help ou ? 16:08:13 <David_Orange> last functest failed, but it seems to be a docker problem 16:08:18 <narindergupta1> need to know what command temptest runs and whether those passes manually in your pod or not? 16:08:24 <narindergupta1> oh ok 16:08:36 <narindergupta1> yesterday it passed 16:08:55 <narindergupta1> and it seems total 98% test cases are passing as per morgan 16:10:11 <David_Orange> yes 16:10:41 <narindergupta1> ok how can i find the 2% failed cases and fix those too. 16:11:25 <David_Orange> i look at them 16:12:49 <narindergupta1> thanks. 16:13:42 <narindergupta1> also need help on debugging failure cause on intel pods. It seems there might be deploy in changing the switch. But we can tell them it is issue with switch then intel might do it sooner 16:15:11 <narindergupta1> David_Orange: also i added few scenrios in joid like dfv, vpn, ipv6 etc.. and i am passing it through -f parameter in ./deploy.sh 16:15:24 <narindergupta1> can it be integreted as part of ci as well? 16:15:41 <David_Orange> yes of course, i can work on that 16:15:47 <narindergupta1> thanks 16:15:58 <David_Orange> do you also have something for dpdk ? 16:16:35 <narindergupta1> David_Orange: dpdk will be part of 16.04 LTS in main and we will be enabling it with 16.04 lts 16:16:53 <narindergupta1> it may be part of SR2 after xenial release. 16:17:19 <David_Orange> ok 16:17:29 <David_Orange> so not before 2 month 16:17:44 <narindergupta1> for experimental basis we can add 16:17:52 <narindergupta1> but it may not work 16:20:52 <David_Orange> ok 16:21:13 <David_Orange> what is dfv ? 16:22:42 <David_Orange> narindergupta: for new parameters, can 2 params can be enabled at the same time ? 16:22:51 <David_Orange> or 3 16:22:59 <David_Orange> or more :) 16:27:22 <narindergupta1> sfv 16:27:28 <narindergupta1> sorry it is sfv 16:27:45 <narindergupta1> David_Orange: currently no 16:28:04 <David_Orange> ok 16:28:12 <narindergupta1> David_Orange: but i can write a combination might do like ipv6dvr 16:28:18 <narindergupta1> ipv6sfc 16:28:21 <David_Orange> and we can enables all those param for all sdn controllers 16:28:55 <David_Orange> can you take more than one '-f' ? 16:28:56 <narindergupta1> well few for nosdn and few for odl 16:29:06 <narindergupta1> no i can not 16:29:13 <narindergupta1> only one right now 16:29:13 <David_Orange> or a coma-dash separated list 16:29:31 <narindergupta1> David_Orange: currently no but we can in future 16:29:49 <David_Orange> ok, so i only set one 16:29:53 <narindergupta1> as i need to enhance my code to accept that 16:29:55 <narindergupta1> ok 16:30:06 <David_Orange> np 16:30:26 <David_Orange> and they can be enabled for each scenarios ? 16:30:36 <narindergupta1> yes 16:30:44 <David_Orange> sorry: s/scenario/sdncontroller/ 16:30:45 <narindergupta1> ha/nonha/tip 16:31:01 <narindergupta1> not necessary 16:31:15 <narindergupta1> like ipv6 is for all. But sfc only for odl 16:31:28 <David_Orange> ok 16:31:40 <David_Orange> and vpn ? 16:31:52 <narindergupta1> only for odl currently 16:32:17 <David_Orange> ok: sfc (all) vpn (odl) ipv6 (odl) 16:32:31 <David_Orange> and for nosdn cases ? 16:32:44 <narindergupta1> no ipv6 all 16:32:52 <narindergupta1> sfc and vpn only odl 16:33:00 <narindergupta1> and same for odl_l2 and odl_l3 16:34:10 <narindergupta1> David_Orange: also dvr for all 16:34:21 <David_Orange> yes sorry 16:35:19 <David_Orange> we have odl_l2 and l3 now ? 16:46:54 <narindergupta1> yes i am trying to enable is using dvr but not sure whether it will work or not. 16:48:33 <narindergupta1> do we have seperate test cases? 16:53:03 <David_Orange> narindergupta: today we have only odl_l2, but i can prepare l2 and l3 16:57:05 <David_Orange> narindergupta1: let me know for odl as i can push the patch 16:57:31 <David_Orange> narindergupta1: do you also thought about OS API access ? 17:01:34 <narindergupta1> David_Orange: yes i am thinking about it and discussing internally. 17:02:12 <David_Orange> ok 17:02:46 <narindergupta1> David_Orange: currently issue is containers can e only on admin network. To enabled containers with other host network we need to wait for MAAS 2.0 17:03:25 <David_Orange> ok, and for scenario with fixed ip for all endpoints ? 17:03:37 <narindergupta1> yes 17:03:59 <David_Orange> this did not require a new network 17:04:38 <narindergupta1> for fix ip no issues as we can go for different vip address which act as end point anyway so may be changing the vip in bundle might help. 17:05:39 <David_Orange> if all endpoints have fixed and known address i can set the reverse proxy quickly 17:06:12 <David_Orange> but we also need to setup the publicurl to a dedicated fqdn, is it possible ? 17:09:41 <David_Orange> narindergupta1: actually i transform odl_l2 and odl_l3 to odl, do you want i remove that and push odl_l2 odl_l3 to deploy.sh ? 17:10:21 <narindergupta1> David_Orange: means? 17:10:45 <narindergupta1> David_Orange: of that sense no -s should be odl only 17:11:00 <narindergupta1> and -f can be odl_l2 or odl_l3 17:11:21 <David_Orange> ok, so for you this is an option ? 17:11:42 <narindergupta1> correct 17:11:54 <David_Orange> other installer set odl_l2 and odl_l3 in controller part (os-<controller>-<nfvfeature>-<mode>[-<extrastuff>]) 17:11:59 <narindergupta1> as i need to enable profile in odl to enable this 17:12:25 <David_Orange> ok 17:12:44 <narindergupta1> unfortunately we do not define that way as we have single controller odl and in that feature can be enabled for l2 and l3. 17:13:08 <David_Orange> i push you a patch 17:13:16 <narindergupta1> thanks 17:13:50 <narindergupta1> David_Orange: my question was for you that do we have seperate test cases for l2 and l3? 17:14:30 <David_Orange> no 17:15:54 <David_Orange> narindergupta1: odl_l3 = odl_l2 + l3 true ? so i dont need to add a scenario name for odl_l2: os-odl_l2-old_l2-ha 17:16:58 <narindergupta> ok sounds good to me 17:17:21 <narindergupta> David_Orange: i am not sure by default iodl_l2 is enabled 17:17:50 <narindergupta> as i can see in odl only l3 is enabled by default but for l2 we need to enable the switch specifically 17:18:28 <David_Orange> if we set odl for sdn controller, odl_l2 is enable by default, no ? 17:18:35 <narindergupta> so i believe naming of default scenario is not true. It should be l3 default and that what we test 17:19:02 <narindergupta> how to find it out? 17:20:10 <narindergupta> currently we enable using this article https://wiki.opendaylight.org/view/OpenStack_and_OpenDaylight 17:22:11 <narindergupta> i 17:22:27 <David_Orange> "OpenStack can use OpenDaylight as its network management provider through the Modular Layer 2 (ML2) north-bound plug-in" 17:22:44 <David_Orange> today it is odl_l2, ip services are provided by neutron 17:23:20 <narindergupta> ok yes thats what we have 17:23:44 <narindergupta> i am wondering how to enable l3 then? 17:24:02 <narindergupta> in that case i was in confusion. as there is l2switch module in odl 17:24:14 <narindergupta> and i was thinking for l2 i need to enable that 17:24:53 <narindergupta> is there any odl documentation which explains this in better way? 17:24:54 <David_Orange> by default odl is using ovs as switch, it may be an other switch 17:24:57 <David_Orange> let me check 17:26:00 <narindergupta> but does that mean by enabling odl-l2switch we are enalbing l2? 17:26:15 <narindergupta> or by adding ml2 plugin means it is odl_l2 17:26:21 <David_Orange> https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:L2_Switch 17:27:01 <narindergupta> so my question is to enable l2 do i need to enable this? 17:27:15 <David_Orange> as far as i understand, enabling ml2plugin enable odl for network layer (l2) 17:27:42 <David_Orange> i dont think so, until now odl was working without no ? 17:27:45 <narindergupta> ok then we support l2 and i can verify that and for l3 enablement what should be done? 17:27:58 <David_Orange> for l3 i dont know 17:28:37 <David_Orange> l2switch seems to be much more an enhance l2 switch (you can keep it as mdsal option for example 17:28:59 <narindergupta> our charm developer followed this and integreted https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:L2_Switch 17:29:08 <narindergupta> sorry https://wiki.opendaylight.org/view/OpenStack_and_OpenDaylight 17:29:38 <David_Orange> this is for l2 17:29:39 <narindergupta> David_Orange: ok 17:30:07 <narindergupta> ok sounds good then i need to correct hu bin as i telling him that l3 is enabled by default 17:30:21 <narindergupta> now i need to figure it out what to do to enable l3 then 17:30:38 <narindergupta> so that both l2 and l3 is enabled by odl 17:30:47 <David_Orange> l3 is managed by neutron 17:31:08 <narindergupta> ok so l3 need to be enabled in neutron 17:31:22 <David_Orange> yes, i thinks so 17:31:43 <David_Orange> i am far to be an odl expert, but this is my understanding 17:31:46 <narindergupta> i think in neutron we already enabled it by default 17:32:03 <David_Orange> until now odl was well running 17:32:09 <narindergupta> so in that case we have both odl_l2 and l3 by default 17:32:39 <David_Orange> should not we keep it simple until B release then work on all that new features ? 17:32:46 <narindergupta> but i need confirmation so that i can tell the community how can i do that? 17:33:28 <David_Orange> i can cal my colleage tomorrow, but i am not sure he is working (this is an holiday period here) 17:33:51 <narindergupta> David_Orange: oh ok who else can guide me. 17:34:04 <David_Orange> last time he check (1 month ago) we were using odl for layer 2 and neutron for l3 17:36:30 <David_Orange> narindergupta1: dont know any else. 17:36:43 <David_Orange> narindergupta1: scenario should be frozen from 2 or 3 weeks ago 17:37:43 <David_Orange> we should wait to have a clean install of B release 4 times, froze Bramaputra then add those new features 17:39:11 <David_Orange> narindergupta1: so i can add ipv6, sfc and so on, but it is not the way i would do it. But you are the boss :) 17:39:17 <narindergupta> David_Orange: if neutron l3 then it is already enabled in joid. we are testing it. And it is enalbed by default 17:39:28 <David_Orange> yes 17:39:58 <David_Orange> today we are loop testing odl at l2 and neutron at l3 17:40:01 <narindergupta> David_Orange: please add it and we will run if fails then won't release otherwise will get added 17:40:05 <David_Orange> this is my understanding 17:40:22 <narindergupta> and our tests are passing correct? 17:41:24 <David_Orange> today functest says: odl on joid pod2 = 100%, no ? 17:41:47 <narindergupta> i think morgan stated 98% 17:42:07 <David_Orange> odl_l3 is much more an option 17:42:15 <David_Orange> 98% is on tempest no ? 17:42:20 <narindergupta> overall 17:42:33 <narindergupta> tempotest only 3 failed test cases related to glance 17:42:54 <narindergupta> two are related to glance which needs to understand and one related to boot option. 17:43:07 <narindergupta> as per viktor manuall verification worked for both 17:43:18 <David_Orange> yes, but odl tests are all ok: 18 tests total, 18 passed, 0 failed 17:43:43 <narindergupta> yes odl all were passed 17:43:44 <David_Orange> so for odl we should not touch 17:44:02 <narindergupta> David_Orange: please add an option in ci and i will run that test and sure it will pass. 17:44:22 <David_Orange> odl_l3 option ? 17:44:34 <narindergupta> yes do it please 17:44:55 <narindergupta> and if neutron handles it then it is already there. 17:45:12 <narindergupta> basically same deployment have both l2 and l3 enabled then 17:45:24 <narindergupta> l2 by odl and l3 by neutron 17:48:12 <David_Orange> https://gerrit.opnfv.org/gerrit/9821 17:50:58 <David_Orange> narindergupta1: i have to go 17:51:14 <narindergupta> David_Orange: one change you need to pass -f 17:51:35 <David_Orange> it is passed line 147 17:52:03 <David_Orange> narindergupta1: is it ok ? 17:52:15 <narindergupta> y3es 17:52:25 <David_Orange> ok, good, see you tomorrow 17:52:39 <narindergupta> ok see you 17:53:38 <narindergupta> David_Orange: sorry -f is missing 17:54:08 <narindergupta> in line 147 and line 149 15:04:22 <David_Orange> narindergupta: hi 15:04:32 <narindergupta> David_Orange: hi 15:05:11 <David_Orange> morgan is asking if we are ok to set pod2 as CI pod until B official release ? 15:05:26 <narindergupta> David_Orange: yes i am +1 for it 15:05:30 <David_Orange> ok, nice 15:06:09 <narindergupta> David_Orange: for maas i have to do minor adjustment for dhcp and static ip address though/ 15:06:09 <David_Orange> narindergupta: have you seen my mail about ODL l2switch 15:06:20 <narindergupta> still need to check 15:06:32 <David_Orange> okok 15:06:37 <David_Orange> and ok 15:07:26 <narindergupta> Davidregarding the feature needs to be enabled 15:08:02 <David_Orange> which one ? 15:08:10 <narindergupta> no 1 15:08:13 <David_Orange> #undo 15:08:16 <David_Orange> yes 15:08:23 <narindergupta> yeah we are enabling minimum now in Be 15:08:35 <narindergupta> David for l2 switch nice to know 15:08:55 <David_Orange> ok, so you are not enabling all features as in the doc, good 15:09:13 <David_Orange> i will have more feedback in 10 days 15:11:04 <David_Orange> and during my holidays, next week, if you need, you can send me a mail (if you want me to set the reverse proxy for public API access 15:11:52 <David_Orange> i will not answer it in the hour, but try to check some time 15:16:33 <narindergupta> David_Orange: sure david. 15:17:54 <narindergupta> David_Orange: also for crating the reverse proxy need to work with you. Lets try something which works for all 15:18:07 <David_Orange> yes of course 19:56:06 <narindergupta> bryan_att: i found the issue with keystone charm it seems i applied the patch to ha install but leftout nonha and i am fixinf it now 20:06:27 <narindergupta> bryan_att: fixed it you can give a retry 04:57:31 <narindergupta> yuanyou: hi 04:57:54 <yuanyou> narindergupta:hi 04:58:25 <narindergupta> yuanyou: i am still finding the installation issues with onos. Will you please look into it? We have holiday tomorrow but will restart the build once you will fix it 04:58:48 <narindergupta> yuanyou: this time it is config change on neutron-gateway 04:59:47 <yuanyou> narindergupta: I am working on this ,but I don't know how to fix it. 05:00:35 <narindergupta> what is the issue? Which script is failing? 05:01:10 <narindergupta> as i fixed other issue and those were due to extra space and not having proper config() definitin 05:01:56 <narindergupta> but other i have not idea how your team has implmented. Best way to look itnto the logs on the neutron-gateway unit and see what errors 05:02:00 <narindergupta> and try to resolv 05:02:13 <yuanyou> narindergupta: I only know config-changed failed,but i don't know which line failed 05:02:35 <narindergupta> more errors can b find on failed unit 05:02:52 <narindergupta> and check /var/log/juju/unit-neutron-gateway-0.log 05:03:01 <narindergupta> on neutron-gateway unti 05:03:31 <yuanyou> narindergupta:yes, I am deploying on my own environment 05:03:42 <narindergupta> ok no problem 05:04:08 <narindergupta> meanwhile on pod5 can u remove the auto run of onos until this issue fixes 05:04:34 <narindergupta> and this is supposed to be stable ci lab but unfortunately its failing today on onos 05:04:45 <narindergupta> and you can use intel pod6 though for development 05:05:23 <yuanyou> narindergupta:ok,i will remove the auto run in releng 05:05:31 <narindergupta> thanks 14:57:09 <jose_lausuch> narindergupta: ping 14:57:17 <narindergupta> jose_lausuch: pomg 14:57:20 <narindergupta> whats up? 14:57:50 <narindergupta> jose_lausuch: io have installed gsutil into pod5 for joid and submit the patch in master branch 14:58:00 <narindergupta> will do same for other pods as well 14:58:49 <jose_lausuch> narindergupta: great 14:58:52 <jose_lausuch> fdegir: ping 14:59:08 <jose_lausuch> can you help to install gsutil with proper credentials? (I have no clue what's needed) 14:59:33 <narindergupta> yeah in orange pod5 i am seeeing upload error 14:59:49 <narindergupta> where gsutil was installed 15:01:04 <narindergupta> jose_lausuch: you were talking about image issue> 15:01:05 <narindergupta> > 15:01:06 <narindergupta> ? 15:01:20 <jose_lausuch> narindergupta: flavor issue 15:01:38 <jose_lausuch> https://build.opnfv.org/ci/view/functest/job/functest-joid-intel-pod5-daily-brahmaputra/45/console 15:01:38 <narindergupta> jose_lausuch: what was that? 15:01:50 <jose_lausuch> - ERROR - Flavor 'm1.small' not found. 15:01:59 <jose_lausuch> that is on joid intel pod 5 15:02:00 <jose_lausuch> but 15:02:38 <jose_lausuch> however 15:02:43 <jose_lausuch> if you look above 15:02:46 <narindergupta> but its there | 2 | m1.small | 1 | 2048 | | 20 | when we run 15:02:48 <jose_lausuch> Flavors for user `admin` in tenant `admin`: 15:02:48 <narindergupta> yeah 15:02:50 <jose_lausuch> yes 15:02:54 <jose_lausuch> so, that is strange 15:02:59 <jose_lausuch> and also, lookin below 15:02:59 <narindergupta> yeah 15:03:01 <jose_lausuch> for promise test 15:03:11 <jose_lausuch> Error [create_flavor(nova_client, 'promise-flavor', '512', '0', '1')]: ('Connection aborted.', BadStatusLine("''",)) 15:03:11 <narindergupta> and by defaul t we do not deleted anything 15:03:15 <jose_lausuch> cann't create flavor 15:03:35 <narindergupta> that says connected aborted. 15:03:43 <narindergupta> why was it aborted? 15:03:46 <jose_lausuch> ya, casual.. 15:03:48 <jose_lausuch> I dont know 15:03:59 <jose_lausuch> it's just creating a flavor with those specifications 15:04:07 <narindergupta> 2016-02-14 17:14:41,689 - vPing_userdata- INFO - Flavor found 'm1.small' 15:04:18 <jose_lausuch> requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)) 15:04:22 <jose_lausuch> yes, same error there 15:04:26 <jose_lausuch> that is strnage 15:05:02 <narindergupta> jose_lausuch: but i saw this passed in latest deployment 15:05:36 <jose_lausuch> this is even worse: https://build.opnfv.org/ci/view/functest/job/functest-joid-intel-pod5-daily-brahmaputra/44/console 15:05:44 <jose_lausuch> Error [create_glance_image(glance_client, 'functest-vping', '/home/opnfv/functest/data/cirros-0.3.4-x86_64-disk.img', 'True')]: <requests.packages.urllib3.connection.HTTPConnection object at 0x7efe66c3a710>: Failed to establish a new connection: [Errno 113] No route to host 15:06:19 <jose_lausuch> so, we are not having stable results 15:06:35 <narindergupta> jose_lausuch: and in pod6 it passed. https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod6-daily-master/54/consoleFull 15:07:11 <jose_lausuch> ya, that worked 15:07:17 <jose_lausuch> but I'm looking at stable/brahmaputra branch 15:07:22 <jose_lausuch> https://build.opnfv.org/ci/view/functest/job/functest-joid-intel-pod5-daily-brahmaputra/ 15:07:25 <narindergupta> it looks like pointing me towards the pod stability as aborted connection is something called for networkign failure as 15:07:32 <jose_lausuch> yes 15:07:38 <jose_lausuch> looks like network failure or something 15:07:41 <narindergupta> there is no difference in branch though 15:07:49 <narindergupta> from installer prospective 15:08:46 <narindergupta> jose_lausuch: if you will see in intel pod5 https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod5-daily-brahmaputra/46/console 15:08:49 <narindergupta> it passed 15:08:57 <narindergupta> even in pod5 15:09:40 <jose_lausuch> narindergupta: yes, the flavor thing worked there... 15:09:40 <narindergupta> and temptest failed modt of time and not sure why? but same test passes in orange pod2 15:09:45 <jose_lausuch> but 172 failures in tempest. 15:10:28 <narindergupta> jose_lausuch: never got good results on intel pod5. Can you please help here., AS in oramge pod2 only 5 test cases failed 15:10:38 <narindergupta> and test results were 98% 15:10:47 <jose_lausuch> yes 15:10:52 <jose_lausuch> I know 15:11:02 <jose_lausuch> but why is this pod giving these bad results? 15:11:07 <jose_lausuch> and also the job was aborted 15:11:10 <jose_lausuch> while running 15:11:12 <narindergupta> david think it could be networking issue in pod 15:11:21 <narindergupta> do not know 15:11:34 <jose_lausuch> that could explain it 15:11:58 <narindergupta> jose_lausuch: but deployment always worked. 15:12:12 <jose_lausuch> taking a look at this: 15:12:13 <jose_lausuch> https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod6-daily-master/ 15:12:26 <jose_lausuch> all the latests jobs are aborted.. (grey) 15:12:50 <narindergupta> yes 15:14:08 <narindergupta> jose_lausuch: i am seeing this aborted issue for long time and it never passes and not sure what it does 15:14:15 <narindergupta> may be you can debug. 15:14:30 <narindergupta> and time out is 210 minutes 15:15:07 <jose_lausuch> narindergupta: timeout for what? 15:15:13 <narindergupta> for job 15:15:21 <narindergupta> check the status 15:15:41 <narindergupta> jose_lausuch: Build timed out (after 210 minutes). Marking the build as aborted. 15:16:36 <jose_lausuch> narindergupta: ok.... something taking too long maybe... 15:17:03 <narindergupta> jose_lausuch: not sure it is stuck or taking too long 15:17:58 <jose_lausuch> narindergupta: ya, I see now in our jjob a timeout of 210 sec... 15:17:59 <jose_lausuch> wrappers: 15:17:59 <jose_lausuch> - build-name: 15:17:59 <jose_lausuch> name: '$BUILD_NUMBER Suite: $FUNCTEST_SUITE_NAME Scenario: $DEPLOY_SCENARIO' 15:17:59 <jose_lausuch> - timeout: 15:17:59 <jose_lausuch> timeout: 210 15:18:05 <jose_lausuch> mmmm 15:18:12 <narindergupta> ok 15:18:24 <jose_lausuch> but it is not normal that it takes that long... 15:19:37 <narindergupta> jose_lausuch: yes its not so need a debugging where it got stuck. 15:20:12 <narindergupta> also yardstick was working until last week on orange pod2 but not anymore 15:20:56 <jose_lausuch> ok 15:21:08 <jose_lausuch> Im trying to figure out what is getting stuck 15:22:41 <jose_lausuch> for example vIMS gets an error and takes almost 30 min 15:34:14 <jose_lausuch> narindergupta: I have detected that the Cinder tests that does Rally, take 45 min 15:34:36 <jose_lausuch> I have checked on other installers, and it takes only 20 min 15:35:06 <narindergupta> oh ok could be in intel pod5 and pod6 we do not have extra hard disk 15:35:21 <narindergupta> can u check on orang epod2 how much time it takes? 15:35:22 <jose_lausuch> narindergupta: same in orange pod 2, aroound 20 min 15:35:33 <jose_lausuch> and I guess the other rally tests will take longer too 15:35:45 <jose_lausuch> the normal functest runtime is around 2.5 hr 15:35:51 <jose_lausuch> this timedout after 3.5 hr... 15:35:55 <jose_lausuch> something is bad there 15:35:57 <narindergupta> that may be reason in intel pods no extra disk ssd disk so we are using the os disk 15:36:09 <jose_lausuch> I'll check other tests 15:36:14 <narindergupta> sure please 15:36:41 <narindergupta> specially network specific. In worst case we can increase the time oput and test 15:38:30 <jose_lausuch> for neutron test, normally it takes 10 min, and on intel pod 2 20+ minutes 15:38:38 <jose_lausuch> so everything is slowly there.. 15:43:45 <narindergupta> intel pod2 ? 15:43:56 <narindergupta> so other pods also taking same time? 15:44:10 <narindergupta> not specific to pod5 and pod6? 15:45:40 <narindergupta> jose_lausuch: all pods i am seeing this https://build.opnfv.org/ci/view/joid/job/joid-verify-master/ do you think it is connectivity issue 15:45:41 <narindergupta> ? 15:46:02 <narindergupta> to linux foundation 15:47:09 <jose_lausuch> narindergupta: you mean this? pending—Waiting for next available executor on intel-us-build-1 15:47:28 <narindergupta> yeah 15:47:39 <jose_lausuch> I dont know... 15:47:49 <narindergupta> pod5, pod6 and orange pod2 shows the same status 15:49:55 <jose_lausuch> ya... strange 16:09:42 <narindergupta> jose_lausuch: anyway is temptest is time dependent? 16:10:32 <jose_lausuch> narindergupta: what do you mean 16:11:01 <narindergupta> i am seeing few test cases faied in intel pod5 and 6 16:11:21 <narindergupta> so just wondering some temptest times out as well and marked as failed 16:11:36 <narindergupta> just thinking loud? 16:12:33 <jose_lausuch> we can check the logs 16:12:47 <jose_lausuch> ah no, we cant, they are not pushed to artifacts 16:13:06 <jose_lausuch> we can check them on the container 16:13:32 <narindergupta> which container? 16:15:52 <jose_lausuch> the functest docker container on the jumphost 17:16:03 <jose_lausuch> narindergupta: ping 17:16:24 <narindergupta> jose_lausuch: pong 17:16:36 <jose_lausuch> narindergupta: can you tell me the IPs of the jumphost on intel-pod5 and 6? 17:16:46 <jose_lausuch> I have the vpn 17:16:50 <jose_lausuch> but dont know the ips 17:16:57 <narindergupta> 10.2.65.2 and 10.2.66.2 17:17:12 <narindergupta> [pd5 and pod6 but i need to add your ssh keys for access. 17:17:29 <narindergupta> give me your ssh public keys 17:17:54 <jose_lausuch> ok, let's do it with private chat 17:18:03 <narindergupta> sure 17:18:47 <jose_lausuch> ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9I5Pyg3tND4sV9EoEW3jCqY+91IOdkNCAe8xscI1mRcVLlN7/0YHFCQLFX8q+lTAMWhguMkoUf4y0w6rMmDE0c59XKUNYUPHMPT6vEBbHz9JjCZdEhHGouDJmSAxS0PBLrv+nj+P9fhFVUdxf+pWzaCID8zZDfBq2k7KGtyREmV/l1jkIatDUjh5Hj0lenkYvH85nrAQaAWa3WoLieqj8Ve9ruoguhV6/IjWbUtU+JX/9FLyn10Wq+ArySIbDbwD2ajI//4E1XPDfztzjsscU3sSUw9vwaP78/1XOHPKeEvgd1UBIG4TzaTuRgLmsTtWar409sZ8QsPkE2CwkS4OB ejolaus@ejolaus-dev 17:18:52 <jose_lausuch> oops, sorry :) 17:19:04 <narindergupta> no worrues 17:19:40 <narindergupta> ok now you can try using 10.2.66.2 intel pod6 17:19:45 <narindergupta> with user jenki 17:19:53 <narindergupta> sorry user jenkins 17:21:17 <jose_lausuch> can I try first the pod5? I already have that vpn opened 17:21:29 <narindergupta> ok just a moment then 17:23:26 <narindergupta> please try now 17:23:40 <jose_lausuch> narindergupta: I opened vpn also for pod6, and it works 17:23:42 <jose_lausuch> Im in, thanks! 17:23:49 <narindergupta> cool 17:25:09 <jose_lausuch> I will run some test on pod5, is that ok? 17:25:29 <jose_lausuch> is the deployment up? 17:26:21 <jose_lausuch> root@ce0974c487f4:~# neutron net-list 17:26:21 <jose_lausuch> Unable to establish connection to http://10.4.1.27:9696/v2.0/networks.json 17:26:33 <jose_lausuch> what is this? that address is not showed in the endpoint list 17:27:31 <narindergupta> just now checked looks like onos tried to installed ans habing issue with neutron 17:27:55 <narindergupta> i can retart the odl deployment might take an hour if you are ok with that 17:27:55 <narindergupta> ? 17:28:17 <jose_lausuch> narindergupta: ok, but I might check tomorrow.. 17:28:34 <narindergupta> overnight job will on that 17:28:53 <narindergupta> and install will override again as this is ci pods 17:29:24 <narindergupta> or you can look into it tomorrow morning your time 17:29:51 <narindergupta> as onos job is scheduled at 8:00 AM CST i believe 17:31:25 <jose_lausuch> ok 03:54:27 <narindergupta> yuanyou: hi 03:54:57 <narindergupta> yuanyou: i have send you an information and it seems you need to do charm sync of neutron-api-onos as well 04:53:27 <yuanyou> narindergupta : yes,I saw it ,and I am test it . 04:53:42 <narindergupta> yuanyou: ok 04:54:16 <narindergupta> yuanyou: thanks also it is good habit to to do charmsync for your charms in case you are taking it from openstack cahrms 04:56:40 <yuanyou> narindergupta: yes ,that will be fine 04:57:56 <narindergupta> yuanyou: currently there won't be any onos build op into pod5 as functest team need this for debugging temptest failures. But we can use intel pod6 once you are able to correct the charm sucessfully? 05:01:24 <yuanyou> narindergupta: yes, I see 13:11:46 <jose_lausuch> narinderg_cfk: ping when you are up 13:30:07 <narinderg_cfk> hi jose_lausuch 13:30:28 <jose_lausuch> I so you aborted a job after 4 hors.. 13:30:31 <jose_lausuch> hours 13:31:04 <narinderg_cfk> jose_lausuch: yesternight i did not abort 13:32:02 <narinderg_cfk> jose_lausuch: and intel pod5 was success daily 13:32:14 <narinderg_cfk> https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod5-daily-brahmaputra/52/console 13:32:53 <narinderg_cfk> jose_lausuch: so does for intel pod6 https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod6-daily-master/55/console but need to see the failed test cases 13:34:23 <narinderg_cfk> jose_lausuch: but yardstick test cases failed 13:42:52 <narinderg_cfk> jose_lausuch: it seems increasing timeout completed atleast the test in all labs. 13:43:06 <jose_lausuch> narinderg_cfk: I mean this one, the previous one 13:43:06 <jose_lausuch> https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod5-daily-brahmaputra/51/console 13:43:15 <jose_lausuch> narinderg_cfk: anyway, the blue ball took 5 hr... 13:43:38 <jose_lausuch> narinderg_cfk: the problem is cinder 13:43:43 <narinderg_cfk> jose_lausuch: correct. yes above aborted because i wanted to run overnight builkd 13:43:46 <jose_lausuch> | cinder | 52:41 | 50 | 98.82% | 13:44:03 <narinderg_cfk> ok one hour 13:44:42 <jose_lausuch> I will compare this 13:44:46 <jose_lausuch> with another installer 13:45:30 <narinderg_cfk> in orange pod2 only 19.31 13:45:49 <narinderg_cfk> | cinder | 19:31 | 50 | 100.00% | 13:46:11 <narinderg_cfk> but they have ssds 13:47:06 <narinderg_cfk> jose_lausuch: but heat test cases are failing on all pods | heat | 03:52 | 2 | 7.69% | 13:47:42 <jose_lausuch> narinderg_cfk: http://pastebin.com/raw/kFdKVDRQ 13:47:51 <jose_lausuch> look at keystone 13:47:58 <jose_lausuch> took 3 hours... that is the real problem 13:48:00 <jose_lausuch> and cinder 1 hr 13:49:19 <narinderg_cfk> i think all service took extra time 13:49:33 <jose_lausuch> for cinder I can understand that there are not SSDs 13:49:40 <narinderg_cfk> correct 13:49:40 <jose_lausuch> but for keystone?? 13:49:47 <jose_lausuch> 3 hrs?? 13:49:54 <jose_lausuch> there is a network issue for sure 13:50:38 <narinderg_cfk> yeah looks like 13:51:06 <narinderg_cfk> jose_lausuch: how to figure it out? 13:51:28 <jose_lausuch> narinderg_cfk: I'm trying to login to the pod, but I am not sure how to start troubleshooting this... 13:53:16 <narinderg_cfk> jose_lausuch: i think we need help 13:55:47 <narinderg_cfk> jose_lausuch: lkets discuss in pharos channel please join opnfv-pharos 14:33:55 <narindergupta> jose_lausuch: ok so manual working of openstack is fast enough but performance detriot during test. can u check the logs on thesystem what point it took more time 14:36:28 <jose_lausuch> narindergupta: sorry, need to waity, Im reporting in the release meetning 15:46:04 <narindergupta> jose_lausuch: i am back again sorry it was power outage. 15:46:24 <jose_lausuch> narindergupta: no prob 15:49:23 <narindergupta> catbus1: hi 15:53:23 <jose_lausuch> narindergupta: Im back on intel pod 5 15:53:27 <jose_lausuch> running test by test 15:53:42 <narindergupta> jose_lausuch: ok 15:58:17 <jose_lausuch> we have yet another problem 15:59:20 <narindergupta> whats the issue? 16:00:50 <jose_lausuch> now vping doesnt work 16:01:57 <jose_lausuch> narindergupta: JOID uses the same network segment for public and admin network... 16:02:28 <narindergupta> jose_lausuch: there are different network 16:02:43 <jose_lausuch> if I do keystone endpoint-list 16:03:03 <narindergupta> jose_lausuch: yeah for endlist we are defining the public segment seperate 16:03:06 <jose_lausuch> http://hastebin.com/odovurufog.sm 16:03:10 <narindergupta> its all on admin network 16:03:16 <jose_lausuch> 10.4.1.0/24 for all 16:03:22 <jose_lausuch> ah 16:03:37 <narindergupta> for endpoints only 16:03:38 <jose_lausuch> I wonder if that could also be an issue 16:04:10 <jose_lausuch> is there a joid gui of the deployment? 16:04:20 <narindergupta> yes it is on admin entwork 16:04:42 <jose_lausuch> is there later on isolation for storage/public/admin networks ? 16:05:48 <narindergupta> there is isolation of data, public and admin 16:05:52 <jose_lausuch> ok, I see that 10.2.65.0/24 is the public range 16:06:00 <narindergupta> correct 16:06:02 <jose_lausuch> ok 16:06:09 <jose_lausuch> then, ignore my comment :) 16:06:12 <narindergupta> :) 16:06:25 <jose_lausuch> vping failed 16:06:38 <jose_lausuch> I will run it again and not clean the instances 16:06:39 <narindergupta> whats the reason? 16:06:43 <jose_lausuch> so that you can login to pod5 and check 16:06:47 <jose_lausuch> cannot ping the floating ip 16:07:33 <narindergupta> which port flatin ip created? 16:07:40 <narindergupta> ext-net or somewhere else 16:08:08 <jose_lausuch> ya 16:08:15 <jose_lausuch> can you login to the deployment? 16:10:08 <narindergupta> yeah i am logging in 16:10:12 <jose_lausuch> nova list 16:10:19 <jose_lausuch> and then you'll see there are 2 VMs 16:10:23 <jose_lausuch> one of them with a floating ip 16:10:40 <jose_lausuch> you can check too neutron floatingip-list 16:14:28 <narindergupta> which subnet? 16:14:54 <narindergupta> let me create the router as i used to do and retry 16:16:12 <jose_lausuch> narindergupta: there is already a router 16:18:01 <narindergupta> jose_lausuch: i cna not figure out the n why ping is not working. I know when i test is after fresh install it works 16:18:20 <jose_lausuch> I Assigned a new floating ip to the first vm 16:18:22 <jose_lausuch> and that is pingable 16:18:38 <narindergupta> hun thats wiered 16:18:56 <jose_lausuch> very 16:19:03 <jose_lausuch> do a nova list 16:19:09 <jose_lausuch> I will remove floating ip from vm 2 16:19:12 <jose_lausuch> and assign it again 16:19:42 <narindergupta> yeah i can see 16:19:59 <narindergupta> .84 pings but not .83 16:21:48 <jose_lausuch> narindergupta: nova console-log opnfv-vping-2 16:22:46 <narindergupta> jose_lausuch: i am in 16:22:54 <jose_lausuch> nova console-log opnfv-vping-2|grep 'ifconfig' -A 16 16:23:00 <jose_lausuch> the second VM didnt get the ip 16:23:33 <jose_lausuch> udhcpc (v1.20.1) started 16:23:33 <jose_lausuch> Sending discover... 16:23:33 <jose_lausuch> Sending discover... 16:23:33 <jose_lausuch> Sending discover... 16:23:33 <jose_lausuch> Usage: /sbin/cirros-dhcpc <up|down> 16:23:34 <jose_lausuch> No lease, failing 16:23:34 <jose_lausuch> WARN: /etc/rc3.d/S40-network failed 16:23:35 <jose_lausuch> cirros-ds 'net' up at 181.97 16:23:35 <jose_lausuch> checking http://169.254.169.254/2009-04-04/instance-id 16:25:16 <narindergupta> jose_lausuch: could it be dhcp issue? 16:25:29 <jose_lausuch> I dont know 16:25:31 <jose_lausuch> Im trying another thing 16:27:46 <narindergupta> ok 16:27:56 <jose_lausuch> how can I access horizon? 16:28:51 <narindergupta> you can do X redirect through ssh 16:29:07 <narindergupta> and start the firefox on the jumphost 16:29:40 <narindergupta> there is vncserver on 10.4.0.255 as well 16:29:56 <narindergupta> password is ubuntu 16:30:21 <narindergupta> vip for dashboard is 10.4.1.21 16:30:29 <jose_lausuch> narindergupta: the second VM doesnt get the ip from the dhcp... 16:30:48 <narindergupta> but floating ip pings? 16:31:42 <narindergupta> also nova lsit shows | ffc42fc4-065f-4cfe-8276-cbbb126752be | opnfv-vping-2 | ACTIVE | - | Running | vping-net=192.168.130.4, 10.2.65.87 | 16:31:59 <narindergupta> which means it got the ip somehow. 16:32:25 <narindergupta> assigned but dhco looks like not giving the ip on request. 16:32:58 <jose_lausuch> narindergupta: I use port forwarding, so I open firefox on my local env :) 16:33:32 <jose_lausuch> narindergupta: assigning the ip is not a problem, but if you check the console-log you'll see that dhcp doesnt work 16:33:44 <jose_lausuch> narindergupta: what is the user/password for horizon? 16:34:13 <narindergupta> admin openstack 16:34:47 <jose_lausuch> ok thanks, taht works 16:34:51 <narindergupta> Sending discover... 16:34:51 <narindergupta> Usage: /sbin/cirros-dhcpc <up|down> 16:34:51 <narindergupta> No lease, failing 16:34:51 <narindergupta> WARN: /etc/rc3.d/S40-network failed 16:34:51 <narindergupta> cirros-ds 'net' up at 181.25 16:35:10 <narindergupta> do you think it could be image issue? 16:35:17 <jose_lausuch> why image? 16:35:21 <jose_lausuch> the first VM gets the ip correctly 16:35:37 <narindergupta> it gives the usage error 16:36:16 <jose_lausuch> nova console-log opnfv-vping-1|grep 'Starting network...' -A 10 16:36:23 <jose_lausuch> usage? 16:36:53 <narindergupta> Usage: /sbin/cirros-dhcpc <up|down> 16:36:53 <narindergupta> (10:34:49 AM) narindergupta: No lease, failing 16:37:59 <narindergupta> let me check on neutron-gateway node 16:41:21 <jose_lausuch> ok 16:42:26 <narindergupta> dhcp lease does not show ip its there for .3 16:42:31 <narindergupta> but not for .4 16:42:48 <jose_lausuch> that's bad :) 16:43:06 <narindergupta> i know looks like request did not reached to dhcp 16:44:51 <narindergupta> jose_lausuch: even neutron logs no sign of .4 while .3 its there 16:45:03 <narindergupta> do you know mac address of the interface i can search 16:45:21 <narindergupta> for 2nd vm 16:45:27 <jose_lausuch> yes 16:45:57 <jose_lausuch> eth0 Link encap:Ethernet HWaddr FA:16:3E:57:24:56 16:47:11 <narindergupta> no request with this mac 16:48:27 <narindergupta> can u try to create one more vm? 16:48:42 <narindergupta> lets see how does it behave with the same interface? 16:48:43 <jose_lausuch> yes 16:50:07 <jose_lausuch> narindergupta: done 16:50:12 <narindergupta> ok i am capturing the dhcp agent log 16:50:18 <jose_lausuch> called test-vm 16:50:41 <jose_lausuch> Starting network... 16:50:41 <jose_lausuch> udhcpc (v1.20.1) started 16:50:41 <jose_lausuch> Sending discover... 16:51:57 <jose_lausuch> again sending discover 16:52:01 <jose_lausuch> no leases?? 16:53:46 <narindergupta> i am seeing this in one of dhcp agent log 2016-02-16 07:32:40.432 12656 ERROR neutron.agent.dhcp.agent RemoteError: Remote error: IpAddressGenerationFailure No more IP addresses available on network 303fd1aa-10fd-4f73-b8c1-475fdd8f0a09. 16:53:46 <narindergupta> not now but issue was seen earlier 16:53:47 <jose_lausuch> Sending discover... 16:53:47 <jose_lausuch> Sending discover... 16:53:47 <jose_lausuch> Sending discover... 16:53:47 <jose_lausuch> Usage: /sbin/cirros-dhcpc <up|down> 16:53:47 <jose_lausuch> No lease, failing 16:53:47 <jose_lausuch> WARN: /etc/rc3.d/S40-network failed 16:53:57 <jose_lausuch> aha! 16:54:00 <jose_lausuch> interesting 16:54:01 <narindergupta> looks like some how related 16:54:24 <jose_lausuch> but it doesnt make sense 16:54:31 <jose_lausuch> | vping-subnet | 192.168.130.0/24 | {"start": "192.168.130.2", "end": "192.168.130.254"} | 16:54:36 <jose_lausuch> the range is quite wide! 16:55:16 <narindergupta> can u check whether on dashboard dhc agent services are up 16:55:17 <narindergupta> ? 16:55:29 <jose_lausuch> node6-control Enabled Up 16:55:30 <jose_lausuch> yes 16:55:56 <narindergupta> yeah it matches here on the node 16:56:02 <jose_lausuch> (d4c55165-70d2) 16:56:02 <jose_lausuch> 16:56:02 <jose_lausuch> 192.168.130.2 16:56:02 <jose_lausuch> network:dhcp Active UP 16:56:05 <jose_lausuch> that is the port 17:00:27 <narindergupta> hun so all services are up ports are up 17:01:54 <jose_lausuch> yes 17:03:08 <narindergupta> i think its worth to clear the network including bridges and router and recreate it again and check some time due to no leases available its stuck there and is waiting for leases to avilable 17:03:28 <jose_lausuch> ok 17:03:30 <jose_lausuch> I will do that 17:04:49 <narindergupta> thanks 17:06:12 <jose_lausuch> done 17:06:19 <jose_lausuch> I will run the same test, but on a different network 17:06:25 <jose_lausuch> 192.168.40.0/24 for example 17:09:00 <jose_lausuch> narindergupta: can you check again? 17:09:17 <jose_lausuch> narindergupta: it worked... 17:09:33 <narindergupta> Feb 16 17:08:59 node6-control dnsmasq-dhcp[2860]: DHCPACK(ns-f2fff5fd-05) 192.168.140.4 fa:16:3e:f7:06:3e host-192-168-140-4 17:09:54 <narindergupta> yeah i can verify in syslog i am getting dhcp assigned ip 17:10:09 <jose_lausuch> now the VMs are trying to ping each other 17:10:14 <narindergupta> ok 17:12:39 <jose_lausuch> something is slow or doesnt work 17:14:08 <narindergupta> whats happening 17:14:21 <jose_lausuch> not sure, the test is hanging at some point 17:14:24 <jose_lausuch> I will abort it 17:14:59 <narindergupta> their ping time was 130 s 17:15:06 <narindergupta> earlier in the lab 17:16:12 <jose_lausuch> ya, not sure 17:16:17 <jose_lausuch> if I run it manually it works 17:16:29 <jose_lausuch> ya 17:16:33 <jose_lausuch> so what was the problem? 17:16:36 <jose_lausuch> I would like to retest it 17:16:42 <jose_lausuch> with the same network 17:23:02 <jose_lausuch> narindergupta: you know what? 17:23:05 <jose_lausuch> now it doesnt work 17:23:23 <jose_lausuch> vm2 doesnt get an ip from dhcp 17:23:39 <jose_lausuch> how is that possible? 17:23:49 <jose_lausuch> I removed and created the network again, with the same range 17:23:56 <jose_lausuch> the first vm got an ip 17:24:03 <jose_lausuch> but same issue with vm2 17:24:52 <jose_lausuch> narindergupta: I am creating another VM manually 17:24:57 <jose_lausuch> can you check the dhcp logs? 17:40:57 <narindergupta> no sign of ip 17:41:11 <narindergupta> jose_lausuch: no sign of new ip 17:41:56 <narindergupta> but i have following host listed 17:42:08 <narindergupta> fa:16:3e:85:08:81,host-192-168-140-1.openstacklocal.,192.168.140.1 17:42:08 <narindergupta> fa:16:3e:98:97:81,host-192-168-140-2.openstacklocal.,192.168.140.2 17:42:08 <narindergupta> fa:16:3e:3d:99:fa,host-192-168-140-3.openstacklocal.,192.168.140.3 17:42:08 <narindergupta> fa:16:3e:11:53:88,host-192-168-140-4.openstacklocal.,192.168.140.4 17:42:08 <narindergupta> fa:16:3e:6b:8c:a6,host-192-168-140-5.openstacklocal,192.168.140.5 17:43:27 <jose_lausuch> that is strange 17:44:43 <narindergupta> yeah in leases file its not there but in host file it exist 17:45:48 <narindergupta> and i have liberty version of neutron-dhcp-agent 17:46:06 <jose_lausuch> what ODL version is that? 17:46:11 <narindergupta> Be 17:46:47 <narindergupta> i think we should try the same test on orange pod2 in case something works there? 18:28:36 <narindergupta> catbus1: any update on the user guide? 02:16:00 <bryan_att> narindergupta: I found clues as to why MAAS is not creating the nodes. See the log entry pasted below 02:16:47 <bryan_att> https://www.irccloud.com/pastebin/tpmje5Ft/ 02:17:46 <bryan_att> narindergupta: when I change the parameter power_parameters_power_address to power_address and send the same command from the shell, it works. 02:18:18 <bryan_att> controlnodeid=`maas maas nodes new autodetect_nodegroup='yes' name='node1-control' tags='control' hostname='node1-control' power_type='virsh' mac_addresses=$node1controlmac power_address='qemu+ssh://'$USER'@192.168.122.1/system' architecture='amd64/generic' power_parameters_power_id='node1-control' | grep system_id | cut -d '"' -f 4 ` 02:19:21 <bryan_att> Above is the change I tried for this in 02-maasdeploy.sh, which was the only place I saw this parameter name... but that did not change what was sent. So there is definitely a bug somewhere still. 02:20:16 <bryan_att> note also that resending the command as above still did not set eth1 to auto on the controller... I had to do that manually 04:07:57 <narindergupta> yuanyou: 04:08:29 <narindergupta> i am not seeing any change in charms related to neutron-api-onos for charm syncer 04:41:03 <narindergupta> yuanyou: on intel pod6 i can verify that after doing charm sync we were able to create the network. 06:12:56 <yuanyou> narindergupta: I had synchronized in my local environment, but there is some error ,so I don't commit the changes, and I should have more tests. 15:56:25 <narindergupta> bryan_att: on your intel NUCS can you try to redeploy as i beleive nonha issue with keystone should be fixed now. 15:57:01 <bryan_att> ok, the last deploy timed out last night. Did you see my notes from yesterday on the power_parameters? 16:03:58 <bryan_att> narindergupta: I have the joid-walk meeting today at 1PM PST with your team. I'd like to get a successful deploy before then. I can restart the JuJu deploy now but I think the issues I reported last night would be good to address asap also. I think we can fix whatever issue is requiring me to manually create the machines in MAAS. 16:28:42 <bryan_att> narindergupta: I just recloned the repo and am restarting the maas deploy. 16:28:54 <narindergupta> thanks 16:29:04 <narindergupta> bryan_att: you still need to add the nodes manually 16:29:27 <bryan_att> narindergupta: did you see my earlier note about the bug I found? 16:29:36 <narindergupta> i am working with maas-deployer team on building the maas-deployer so that fixes will be avialalbe soon 16:29:56 <narindergupta> no not yet 16:30:03 <narindergupta> i still have to check 16:32:38 <bryan_att> I can get the nodes created through the command "ssh -i /home/opnfv/.ssh/id_maas -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o LogLevel=quiet ubuntu@192.168.10.3 maas maas nodes new autodetect_nodegroup='yes' name='node2-compute' tags='compute' hostname='node2-compute' power_type='ether_wake' mac_addresses='B8:AE:ED:76:C5:ED' 16:32:38 <bryan_att> power_address='B8:AE:ED:76:C5:ED' architecture='amd64/generic'" 16:33:04 <bryan_att> when I change the parameter power_parameters_power_address to power_address and send the same command from the shell, it works. 16:51:00 <narindergupta> ok 16:58:34 <narindergupta> yeah i made changes in maas-deployer accordingly and hopefuly next builde of maas-deplopyer will fix 17:05:01 <collabot> arturt: Error: Can't start another meeting, one is in progress. Use #endmeeting first. 17:05:19 <arturt> #endmeeting