#opnfv-joid log

17:04:10 <arturt> #startmeeting JOID weekly
17:04:11 <collabot> Meeting started Wed Feb 10 17:04:10 2016 UTC.  The chair is arturt. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:04:11 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic.
17:04:11 <collabot> The meeting name has been set to 'joid_weekly'
17:04:18 <iben_> ahh
17:04:21 <iben_> there we go
17:05:05 <narindergupta> #info Narinder Gupta
17:05:09 <David_Orange> #info David Blaisonneau
17:05:33 <catbus1> #info Samantha Jian-Pielak
17:05:51 <arturt> #info Artur Tyloch
17:06:27 <arturt> #chair iben_
17:06:27 <collabot> Current chairs: arturt iben_
17:06:46 <arturt> #link https://etherpad.opnfv.org/p/joid agenda
17:08:52 <iben_> ionutbalutoiu: has been helping us with this:  http://webchat.freenode.net/?channels=opnfv-meeting
17:09:07 <iben_> #undo
17:09:07 <collabot> Removing item from minutes: <MeetBot.ircmeeting.items.Link object at 0x1baa890>
17:09:08 <iben_> woops - wrong link
17:09:32 <iben_> #info ionutbalutoiu has been helping with JOID testing https://github.com/opnfv/opnfv-ravello-demo
17:10:20 <iben_> #topic adgenda bashing
17:10:24 <arturt> #topic new b release date
17:10:37 <arturt> First week of March
17:10:41 <iben_> #info see etherpad above
17:11:02 <iben_> #topic release B readiness
17:13:03 <iben_> #link https://wiki.opnfv.org/releases/brahmaputra/release_plan new release date feb 26
17:13:31 <akash> Can we add some sort of note about experience per ravello on ontrail and open-daylight?
17:14:20 <akash> *contrail
17:16:33 <iben_> akash: that is already on the adgenda
17:16:40 <akash> okay thanks
17:16:51 <iben_> #info TSC Voted to set the Brahmaputra “release deploy” date to Thursday, February 25
17:17:28 <arturt> #topic Steps to B release https://etherpad.opnfv.org/p/steps_to_brahmaputra
17:18:39 <arturt> #info successful deployment with ODL
17:18:59 <iben_> #info ONOS charm - bug in charm narinder sent email to "the team"
17:19:57 <iben_> #info NTP needs to be set for each environment - suggest to use the MAAS machine as ntp server
17:24:10 <arturt> #topic contrail charm
17:24:17 <arturt> #info Contrail charm - still failing on 2/10 with Liberty https://jira.opnfv.org/browse/JOID-25
17:24:38 <iben_> #info Juniper OpenContrail https://jira.opnfv.org/browse/JOID-25 work in progress
17:28:33 <arturt> #topic ONOS charm
17:28:58 <iben_> #info ONOS charm - bug in charm narinder sent email to "the team"
17:29:05 <arturt> #info not ready - Chinese new year team OoO
17:31:56 <iben_> #info ONOS charm is stored in github but setup synced to launchpad bazzar https://github.com/opennetworkinglab/onos
17:32:30 <arturt> #topic ODL charm
17:33:08 <arturt> #info all ODL test are passing on orange pod
17:33:21 <arturt> #info all  (except 3 tests) ODL test are passing on orange pod
17:33:31 <iben_> #info new ODL charm might fix the IPvsix team issue around L2 L3 mode - need to test
17:35:20 <iben_> #link https://build.opnfv.org/ci/view/joid/ we reviewed the builds here
17:35:26 <arturt> #topic Documentation update
17:36:14 <arturt> #info doc: https://git.opnfv.org/cgit/joid/tree/docs/configguide
17:36:17 <iben_> #link https://gerrit.opnfv.org/gerrit/#/c/5487/
17:36:59 <bryan_att> #info Bryan Sullivan
17:37:42 <iben_> #info you can see the progress of the doc and patches here: #link https://gerrit.opnfv.org/gerrit/#/q/project:joid
17:49:55 <iben_> #info discussion around workflow - how to use JOID to do parralel functional tests in the cloud then perform hardware based performance tests once functional tests have passed
17:51:34 <bryan_att> sorry for asking so many questions - but the config export part of this has an unclear value to me; the running of multiple parallel CI/CD jobs in the cloud is clearly useful, but I don't see how exporting a Ravello-deployed config helps me in my local lab, because I still need to deploy using the JuJu deploy commands...
17:55:34 <bryan_att> OK, I think I understand - if someone developed a bundle tweak in Ravello testing they can export the bundle so we can use it in a local JOID deploy. That part is clear if true.
17:56:17 <bryan_att> That's though just a juju export operation, right? And it doesn't need to include resource specifics e.g. MACs etc.\
17:56:23 <arturt> bryan_att: yes, it is core juju feature
17:57:33 <bryan_att> #1 ravello feature for me is complete transparency on the JOID installer support for that environment, e.g. at most a flag that indicates the special power control etc needs to be used when deploying there.
17:58:22 <bryan_att> "feature for me" means the #1 priority to be addressed through the Ravello project, and upstreamed to OPNFV.
17:58:44 <bryan_att> We need to minimize the lifespan of a ravello fork
18:00:26 <iben_> bryan_att: yes - agreed to minimize (or eliminate) any forks
18:00:53 <iben_> ionutbalutoiu: and i have discussed and agreed to this
18:03:39 <bryan_att> As I mentioned, TOSCA is our target for NSD/VNFD etc ingestion, but for now I understand for Canonical that  JuJu bundles etc are the medium, and an export function for them would be useful explain how to do and use.
18:10:11 <arturt> bryan_att:  have you tried export function ?
18:10:25 <David_Orange> Sorry i have to go. Bye
18:10:28 <bryan_att> arturt: no, not yet
18:10:41 <arturt> #info juju bundle export import https://jujucharms.com/docs/stable/charms-bundles
18:12:09 <arturt> apart from service model you have also a machine specification, which allows you to set up specific machines and then to place units of your services on those machines however you wish.
18:12:14 <catbus1> Ravello export is probably different from Juju charm bundle export.
18:12:37 <arturt> is there any Ravello export?
18:12:48 <catbus1> that's the blueprint, right?
18:13:05 <arturt> but cannot export blueprint outside Ravello
18:13:15 <catbus1> ah, yeah
18:13:17 <arturt> you can replicate blueprint on Ravello
18:13:20 <arturt> ok
18:14:37 <bryan_att> one thing I would like to discuss - the JuJu deploy step takes very long and it's not clear how to know what's going on... any help there would be great.
18:15:28 <bryan_att> e.g. why is it taking so long, what happened when it times out (as it regularly does...), etc - how to debug
18:17:30 <catbus1> running "watch juju status --format tabular" on another terminal helps to see what's going on, which is started, installing packages, failed, etc.
18:18:17 <arturt> bryan_att: usually we can deploy whole OpenStack in approx.  20min...
18:19:11 <iben_> #info bug submitted for maas power type driver for ravello https://bugs.launchpad.net/maas/+bug/1544211
18:19:49 <catbus1> bryan_att: I can talk about the juju-deployer process: http://pastebin.ubuntu.com/15006924/
18:22:03 <catbus1> where is the wiki page?
18:22:35 <iben_> akash: you can see the work bryan_att did with joid here https://wiki.opnfv.org/copper/academy/joid
18:23:09 <iben_> what troubleshooting steps can we take to observe the bottlenecks with JOID?
18:23:37 <narindergupta> catbus1: thats what user guide should include
18:23:52 <iben_> amt or IPMI to see teh console boot process
18:23:52 <arturt> #action schedule JOID community  session with Bryan  to  discuss user experience with Juju akash
18:24:00 <iben_> centralized log collection
18:24:01 <catbus1> the troubshooting steps, we have some info on the configguide
18:24:16 <iben_> verify NTP settings too
18:24:17 <catbus1> installguide
18:24:49 <catbus1> narindergupta: agreed, talking with bryan_att will help with the user guide.
18:25:06 <narindergupta> catbus1:  :)
18:25:54 <arturt> let's schedule a session this week, today or tomorrow - we can use it for user guide
18:25:58 <iben_> where in the log can we see a charm is being downloaded
18:26:22 <iben_> can we enable hash tag progress reports on the console?
18:26:33 <iben_> so we can tail -f the log to see progress?
18:28:14 <bryan_att> bryan.sullivan@att.com
18:30:48 <catbus1> arturt: I am available this week
18:30:55 <catbus1> today or tomorrow
18:31:27 <durschatz> mailto:dave.urschatz@cengn.ca please invite me to meeting with Bryan
18:31:39 <akash> can we set up for next wednesday?
18:31:44 <akash> my week is slammed
18:31:56 <durschatz> yes for me
18:31:58 <akash> i was just about to send an invite and block up to 2 hours
18:32:13 <akash> catbus1: ^?
18:32:22 <narindergupta> bryan_att: btw whats the issue r u facing this time? Can u run me juju status --format=tabular and send me output?
18:32:25 <catbus1> akash: that works for me too
18:32:46 <catbus1> it's not that I am anxious to work on the user guide. ;p
18:32:48 <bryan_att> I can send the juju download logs. But doing "sudo grep download *.log" on all logs in the bootstrap VM I see that downloads started at 16:19 and finished at 16:20 (more than an hour ago) so I don't think that's the issue.
18:33:38 <bryan_att> https://www.irccloud.com/pastebin/UCZQXRCe/it's%20getting%20there%2C%20but%20just%20*very*%20slowly...
18:36:01 <bryan_att> In the last 20 minutes the juju UI went from 2 relations complete to almost all of them. I noticed that it just timed out, so I think somehow that may be allowing the relations to complete...?
18:36:13 <bryan_att> https://www.irccloud.com/pastebin/IlJGKSpF/Here%20is%20the%20timeout%20notice.
18:36:32 <catbus1> bryan_att: the charm download starts in the beginning of joid deployment.
18:36:36 <catbus1> should be short
18:36:55 <bryan_att> it was - about 2 minutes tops
18:42:50 <narindergupta> bryan_att: this simply looks like a timeout of 2 hrs. May be little extra time needed in your environment. Or just wait deployment might fiinsh soon as timeout should not imapct any relation
18:43:59 <bryan_att> narindergupta: looks like I had the keystone error I reported earlier. Maybe that's hanging the process. I can try the "juju resolved keystone/0" workaround
18:45:07 <narindergupta> bryan_att: ok
18:45:07 <bryan_att> narindergupta: I don't see why my environment should have performance issues - these are Intel i7 machines with 16GB RAM connected to a 100MB ethernet switch... the controller has two network interfaces...
18:46:14 <narindergupta> bryan_att: not performance usually maas act as proxy cache
18:47:20 <narindergupta> bryan_att: definetely full logs will help to understand
18:47:37 <bryan_att> OK, should I just paste them here?
18:47:43 <bryan_att> and which ones?
18:49:03 <catbus1> bryan_att: you can paste them to pastebin.ubuntu.com, if not confidential
18:49:16 <narindergupta> ./deploy.sh logs starting from there so i can figure it out from time stamp
18:50:24 <catbus1> irccloud works too.
18:54:55 <bryan_att> Here it is - the ubuntu pastebin only accepts text. The logs are too big. https://usercontent.irccloud-cdn.com/file/ie3mffYE/160210_blsaws_bootstrap_logs.tar.gz
19:04:42 <bryan_att> I used the "juju resolved keystone/0" workaround and things got happier. But normally (2-3 times now) I have to enter the command twice to resolve all the issues (the keystone error seems to reappear?) https://www.irccloud.com/pastebin/NDu0Uygd/
19:19:23 <catbus1> bryan_att: from the juju status output, keystone unit is ready.
19:20:03 <bryan_att> yes, but only after I entered the command "juju resolved keystone/0" - see the previous status I posted where keystone was in "error"
19:20:17 <catbus1> ah, use juju resolved --retry keystone/0
19:20:44 <catbus1> using only juju resolved only makes the status look good, it doesn't do anything.
19:21:03 <catbus1> with '--retry' it will rerun the hook where it failed.
19:21:16 <bryan_att> OK, thanks that's good to know
19:21:48 <catbus1> you may wonder why juju resolved exists, it's for killing units that are in error state. You can't kill units in error state, so get it in working state and kill it.
19:22:49 <catbus1> bryan_att: sometimes the issue will resolve by itself after the re run, but if the error appears again, you can juju ssh keystone/0, and sudo -i as root to look into /var/log/juju/unit-keystone-0.log
19:23:28 <catbus1> find out where the last error is about, manually fix it, go back to the jumphost and re run the juju resolved --retry
19:24:58 <bryan_att> Here are the errors from that log file https://www.irccloud.com/pastebin/7chcwf99/
19:26:11 <catbus1> you need to look at the section above "2016-02-10 18:39:06 DEBUG juju.worker.uniter modes.go:31 [AGENT-STATUS] error: hook failed: "identity-service-relation-changed""
19:27:04 <catbus1> can you copy and paste the section before this error message in the log?
19:27:26 <bryan_att> ok, hang on
19:31:01 <bryan_att> Here it is https://www.irccloud.com/pastebin/cLRXYQlF/
19:32:33 <catbus1> it's too little info.
19:33:47 * catbus1-afk --> meeting
19:38:45 <bryan_att> When you get back - here is more https://www.irccloud.com/pastebin/XsKD1nML/
19:38:56 * bryan_att afk-lunch
19:48:45 <narindergupta> bryan_att: this is same issue about admin roles as trying to create the Admin role and failed as admin already exist
19:49:15 <narindergupta> and this is issue with keystone as well
19:49:25 <bryan_att> ok, so a known issue?
19:49:38 <narindergupta> yeah
19:50:22 <narindergupta> in keystone service does not differentiate between admin and Admin
19:50:34 <narindergupta> while keystone client does
19:51:18 <narindergupta> differentate so send request to service and service failed to create that role and says duplicate
19:52:52 <narindergupta> bryan_att: final success run of deply, functest and yardstick  https://build.opnfv.org/ci/view/joid/job/joid-os-odl_l2-nofeature-ha-orange-pod2-daily-master/
20:00:51 <narindergupta> bryan_att: basically keystone can not create two roles admin and Admin
20:08:41 <bryan_att> narindergupta: is this a keystone issue, or a JOID issue? Does it affect the other installers?
20:11:21 <narindergupta> bryan_att: other installers uses admin as role by default and charms uses Admin so on request from functest team we changed in role to admin in service but looks like some services are also trying to create role Admin which failed in keystone because keystone does not differentiate between admin and Admin. Defeinitely a keystone bug but other installer may not encounter those as they uses admin for everything.
20:12:10 <narindergupta> bryan_att: here is the bug already reported https://bugs.launchpad.net/charms/+source/keystone/+bug/1512984
20:13:04 <narindergupta> bryan_att: comment no 10 I suspect that keystone sees 'admin' and 'Admin' as the same thing from a role name perspective; the problem is that the role created by default is currently all lowercase, whereas the role requested via swift is not - the code checks but is case sensitive.We should fix that, but the root cause of the lowercase role creation is bemusing - its default is 'Admin' in config, not 'admin' and that's used raw by the charm.
20:13:39 <bryan_att> narindergupta: so what's our workaround in the meantime - change back to using "Admin" in the charms?
20:14:00 <narindergupta> yes
20:14:01 <bryan_att> because this appears to be affecting successful deployment
20:15:08 <narindergupta> bryan_att: whenever relationship changes this occur i would prefer Admin in bundle. there is yaml file in ci/odl/juju-deployer
20:15:58 <bryan_att> ok, are you going to issue a patch to change it back? Just wondering how long the issue will remain for JOID.
20:16:00 <narindergupta> change admin to Admin at the end of file for all three deployments juno, kilo and liberty
20:16:06 <narindergupta> then run ./clean.dh
20:16:15 <narindergupta> ./clean.sh
20:16:20 <bryan_att> OK, I can do that.
20:16:57 <narindergupta> admin-role: admin 272     keystone-admin-role: admin
20:17:20 <narindergupta> those two option needs a change from admin to Admin or comment both
20:17:26 <narindergupta> as by default is Admin
20:18:45 <narindergupta> thrn run ./deply.sh -o liberty -s odl -t nonha -l attvirpod1
20:18:54 <narindergupta> will restart the deployment.
20:23:13 <narindergupta> bryan_att: at the end it seems we will be switching to admin by default most likely in charm release in 16.04 as per this bug.
21:27:59 <bryan_att> narindergupta: when you say "change admin to Admin at the end of file for all three deployments juno, kilo and liberty", which file am I changing?
21:28:56 <narindergupta> bryan_att: there are three files for odl https://gerrit.opnfv.org/gerrit/#/c/9697/1/ci/odl/juju-deployer/ovs-odl-nonha.yaml https://gerrit.opnfv.org/gerrit/#/c/9697/1/ci/odl/juju-deployer/ovs-odl-ha.yaml and https://gerrit.opnfv.org/gerrit/#/c/9697/1/ci/odl/juju-deployer/ovs-odl-tip.yaml
21:35:47 <bryan_att> ok, starting the redeploy now
21:36:06 <narindergupta> cool
23:50:49 <bryan_att> narindergupta: this time it went thru to the end, everything looks good so far. 70 minutes to deploy.
23:53:06 <narindergupta> bryan_att: cool yeah that issue we need to work on
23:55:13 <bryan_att> narindergupta: sometime I would like to learn how to set the services to be assigned specific IPs. everytime they get installed they have different addresses. in typical NFV deployments I think we will try to have everything consistent.
23:57:41 <bryan_att> I still have the keystone error though. I was advised (by catbus1) to use the command "juju resolved --retry keystone/0" to retry the hook from where it failed.
00:14:16 <narindergupta> bryan_att: thats strange we need to understand is it same error and is it occuring in your case?
00:14:57 <bryan_att> looks like the same error - I could not login to horizon until I entered the resolved command.
00:15:59 <narindergupta> bryan_att: but this time we used Admin right
00:16:03 <narindergupta> >
00:16:07 <bryan_att> The --retry flag does not appear to have resolved it though - catbus1 said that without the --retry flag, that resolved was just ignored the issues
00:17:18 <bryan_att> Yes, I used Admin for keystone (at the end of the file)
00:17:25 <bryan_att> https://www.irccloud.com/pastebin/qCfxWSmC/
05:10:01 <narindergupta> bryan_att: will you retry commenting both the option in file. Also if you can send me bundles.yaml from joid/ci would be great?
08:15:51 <fdegir> narindergupta: how may I help you?
13:38:12 <narindergupta1> hi David_Orange good morninig
14:30:48 <narindergupta1> ashyoung: hi
14:31:09 <ashyoung> narindergupta1: hi
14:31:53 <narindergupta1> ashyoung: do you know anyonw who can change the onos charm? Looks like team is on chineese new year and deployments are failing
14:32:17 <ashyoung> narindergupta1: yes
14:32:19 <narindergupta1> i have a fix but until it goes into git repo i can not run the install successfuly
14:32:36 <ashyoung> narindergupta1: can you help me out and provide me some details on what's failing?
14:32:43 <ashyoung> Oh
14:32:57 <ashyoung> Do you just need your fix checked in?
14:33:48 <narindergupta1> ashyoung: i need the fixes to check in
14:34:00 <ashyoung> ok
14:34:05 <ashyoung> I can help with that
14:34:32 <narindergupta1> http://bazaar.launchpad.net/~opnfv-team/charms/trusty/onos-controller/fixspace/changes/12?start_revid=12
14:34:37 <narindergupta1> contains the changes i need
14:34:48 <narindergupta1> there are two changes
14:35:36 <ashyoung> Thanks!
14:35:52 <ashyoung> I will get it taken care of right away
14:36:01 <ashyoung> What's the current problem?
14:36:04 <narindergupta1> rev 11 and 12 needs to be added
14:36:20 <narindergupta1> deployment failed because of additonal space introduced in charm
14:36:37 <narindergupta1> nd also then onos-controller charm install failed failed fo config
14:36:49 <narindergupta1> and there are two patches for two issues
14:37:05 <ashyoung> got it
14:38:02 <narindergupta1> ashyoung: once you are able to merge in git tree then i can sync in bazaar and run the deployment again.
14:38:15 <ashyoung> understood
14:39:17 <ashyoung> I'll get it done
14:39:40 <narindergupta1> ashyoung: thanks
14:40:17 <ashyoung> My pleasure
16:05:59 <David_Orange> narindergupta: hi
16:07:18 <narindergupta1> David_Orange: hi ok resize test cases passed now. but need to check with you around the glance api failed cases?
16:07:45 <David_Orange> how can i help ou ?
16:08:13 <David_Orange> last functest failed, but it seems to be a docker problem
16:08:18 <narindergupta1> need to know what command temptest runs and whether those passes manually in your pod or not?
16:08:24 <narindergupta1> oh ok
16:08:36 <narindergupta1> yesterday it passed
16:08:55 <narindergupta1> and it seems total 98% test cases are passing as per morgan
16:10:11 <David_Orange> yes
16:10:41 <narindergupta1> ok how can i find the 2% failed cases and fix those too.
16:11:25 <David_Orange> i look at them
16:12:49 <narindergupta1> thanks.
16:13:42 <narindergupta1> also need help on debugging failure cause on intel pods. It seems there might be deploy in changing the switch. But we can tell them it is issue with switch then intel might do it sooner
16:15:11 <narindergupta1> David_Orange: also i added few scenrios in joid like dfv, vpn, ipv6 etc.. and i am passing it through -f parameter in ./deploy.sh
16:15:24 <narindergupta1> can it be integreted as part of ci as well?
16:15:41 <David_Orange> yes of course, i can work on that
16:15:47 <narindergupta1> thanks
16:15:58 <David_Orange> do you also have something for dpdk ?
16:16:35 <narindergupta1> David_Orange: dpdk will be part of 16.04 LTS in main and we will be enabling it with 16.04 lts
16:16:53 <narindergupta1> it may be part of SR2 after xenial release.
16:17:19 <David_Orange> ok
16:17:29 <David_Orange> so not before 2 month
16:17:44 <narindergupta1> for experimental basis we can add
16:17:52 <narindergupta1> but it may not work
16:20:52 <David_Orange> ok
16:21:13 <David_Orange> what is dfv ?
16:22:42 <David_Orange> narindergupta: for new parameters, can 2 params can be enabled at the same time ?
16:22:51 <David_Orange> or 3
16:22:59 <David_Orange> or more :)
16:27:22 <narindergupta1> sfv
16:27:28 <narindergupta1> sorry it is sfv
16:27:45 <narindergupta1> David_Orange: currently no
16:28:04 <David_Orange> ok
16:28:12 <narindergupta1> David_Orange: but i can write a combination might do like ipv6dvr
16:28:18 <narindergupta1> ipv6sfc
16:28:21 <David_Orange> and we can enables all those param for all sdn controllers
16:28:55 <David_Orange> can you take more than one '-f' ?
16:28:56 <narindergupta1> well few for nosdn and few for odl
16:29:06 <narindergupta1> no i can not
16:29:13 <narindergupta1> only one right now
16:29:13 <David_Orange> or a coma-dash separated list
16:29:31 <narindergupta1> David_Orange: currently no but we can in future
16:29:49 <David_Orange> ok, so i only set one
16:29:53 <narindergupta1> as i need to enhance my code to accept that
16:29:55 <narindergupta1> ok
16:30:06 <David_Orange> np
16:30:26 <David_Orange> and they can be enabled for each scenarios ?
16:30:36 <narindergupta1> yes
16:30:44 <David_Orange> sorry: s/scenario/sdncontroller/
16:30:45 <narindergupta1> ha/nonha/tip
16:31:01 <narindergupta1> not necessary
16:31:15 <narindergupta1> like ipv6 is for all. But sfc only for odl
16:31:28 <David_Orange> ok
16:31:40 <David_Orange> and vpn ?
16:31:52 <narindergupta1> only for odl currently
16:32:17 <David_Orange> ok: sfc (all) vpn (odl) ipv6 (odl)
16:32:31 <David_Orange> and for nosdn cases ?
16:32:44 <narindergupta1> no ipv6 all
16:32:52 <narindergupta1> sfc and vpn only odl
16:33:00 <narindergupta1> and same for odl_l2 and odl_l3
16:34:10 <narindergupta1> David_Orange: also dvr for all
16:34:21 <David_Orange> yes sorry
16:35:19 <David_Orange> we have odl_l2 and l3 now ?
16:46:54 <narindergupta1> yes i am trying to enable is using dvr but not sure whether it will work or not.
16:48:33 <narindergupta1> do we have seperate test cases?
16:53:03 <David_Orange> narindergupta: today we have only odl_l2, but i can prepare  l2 and l3
16:57:05 <David_Orange> narindergupta1: let me know for odl as i can push the patch
16:57:31 <David_Orange> narindergupta1: do you also thought about OS API access ?
17:01:34 <narindergupta1> David_Orange: yes i am thinking about it and discussing internally.
17:02:12 <David_Orange> ok
17:02:46 <narindergupta1> David_Orange:  currently issue is containers can e only on admin network. To enabled containers with other host network we need to wait for MAAS 2.0
17:03:25 <David_Orange> ok, and for scenario with fixed ip for all endpoints ?
17:03:37 <narindergupta1> yes
17:03:59 <David_Orange> this did not require a new network
17:04:38 <narindergupta1> for fix ip no issues as we can go for different vip address which act as end point anyway so may be changing the vip in bundle might help.
17:05:39 <David_Orange> if all endpoints have fixed and known address i can set the reverse proxy quickly
17:06:12 <David_Orange> but we also need to setup the publicurl to a dedicated fqdn, is it possible ?
17:09:41 <David_Orange> narindergupta1: actually i transform odl_l2 and odl_l3 to odl, do you want i remove that and push odl_l2 odl_l3 to deploy.sh ?
17:10:21 <narindergupta1> David_Orange: means?
17:10:45 <narindergupta1> David_Orange: of that sense no -s should be odl only
17:11:00 <narindergupta1> and -f can be odl_l2 or odl_l3
17:11:21 <David_Orange> ok, so for you this is an option ?
17:11:42 <narindergupta1> correct
17:11:54 <David_Orange> other installer set odl_l2 and odl_l3 in controller part (os-<controller>-<nfvfeature>-<mode>[-<extrastuff>])
17:11:59 <narindergupta1> as i need to enable profile in odl to enable this
17:12:25 <David_Orange> ok
17:12:44 <narindergupta1> unfortunately we do not define that way as we have single controller odl and in that feature can be enabled for l2 and l3.
17:13:08 <David_Orange> i push you a patch
17:13:16 <narindergupta1> thanks
17:13:50 <narindergupta1> David_Orange: my question was for you that do we have seperate test cases for l2 and l3?
17:14:30 <David_Orange> no
17:15:54 <David_Orange> narindergupta1: odl_l3 = odl_l2 + l3 true ? so i dont need to add a scenario name for odl_l2: os-odl_l2-old_l2-ha
17:16:58 <narindergupta> ok sounds good to me
17:17:21 <narindergupta> David_Orange: i am not sure by default iodl_l2 is enabled
17:17:50 <narindergupta> as i can see in odl only l3 is enabled by default but for l2 we need to enable the switch specifically
17:18:28 <David_Orange> if we set odl for sdn controller, odl_l2 is enable by default, no ?
17:18:35 <narindergupta> so i believe naming of default scenario is not true. It should be l3 default and that what we test
17:19:02 <narindergupta> how to find it out?
17:20:10 <narindergupta> currently we enable using this article https://wiki.opendaylight.org/view/OpenStack_and_OpenDaylight
17:22:11 <narindergupta> i
17:22:27 <David_Orange> "OpenStack can use OpenDaylight as its network management provider through the Modular Layer 2 (ML2) north-bound plug-in"
17:22:44 <David_Orange> today it is odl_l2, ip services are provided by neutron
17:23:20 <narindergupta> ok yes thats what we have
17:23:44 <narindergupta> i am wondering how to enable l3 then?
17:24:02 <narindergupta> in that case i was in confusion. as there is l2switch module in odl
17:24:14 <narindergupta> and i was thinking for l2 i need to enable that
17:24:53 <narindergupta> is there any odl documentation which explains this in better way?
17:24:54 <David_Orange> by default odl is using ovs as switch, it may be an other switch
17:24:57 <David_Orange> let me check
17:26:00 <narindergupta> but does that mean by enabling odl-l2switch we are enalbing l2?
17:26:15 <narindergupta> or by adding ml2 plugin means it is odl_l2
17:26:21 <David_Orange> https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:L2_Switch
17:27:01 <narindergupta> so my question is to enable l2 do i need to enable this?
17:27:15 <David_Orange> as far as i understand, enabling ml2plugin enable odl for network layer (l2)
17:27:42 <David_Orange> i dont think so, until now odl was working without no ?
17:27:45 <narindergupta> ok then we support l2 and i can verify that and for l3 enablement what should be done?
17:27:58 <David_Orange> for l3 i dont know
17:28:37 <David_Orange> l2switch seems to be much more an enhance l2 switch (you can keep it as mdsal option for example
17:28:59 <narindergupta> our charm developer followed this and integreted https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:L2_Switch
17:29:08 <narindergupta> sorry https://wiki.opendaylight.org/view/OpenStack_and_OpenDaylight
17:29:38 <David_Orange> this is for l2
17:29:39 <narindergupta> David_Orange: ok
17:30:07 <narindergupta> ok sounds good then i need to correct hu bin as i telling him that l3 is enabled by default
17:30:21 <narindergupta> now i need to figure it out what to do to enable l3 then
17:30:38 <narindergupta> so that both l2 and l3 is enabled by odl
17:30:47 <David_Orange> l3 is managed by neutron
17:31:08 <narindergupta> ok so l3 need to be enabled in neutron
17:31:22 <David_Orange> yes, i thinks so
17:31:43 <David_Orange> i am far to be an odl expert, but this is my understanding
17:31:46 <narindergupta> i think in neutron we already enabled it by default
17:32:03 <David_Orange> until now odl was well running
17:32:09 <narindergupta> so in that case we have both odl_l2 and l3 by default
17:32:39 <David_Orange> should not we keep it simple until B release then work on all that new features ?
17:32:46 <narindergupta> but i need confirmation so that i can tell the community how can i do that?
17:33:28 <David_Orange> i can cal my colleage tomorrow, but i am not sure he is working (this is an holiday period here)
17:33:51 <narindergupta> David_Orange: oh ok who else can guide me.
17:34:04 <David_Orange> last time he check (1 month ago) we were using odl for layer 2 and neutron for l3
17:36:30 <David_Orange> narindergupta1: dont know any else.
17:36:43 <David_Orange> narindergupta1: scenario should be frozen from 2 or 3 weeks ago
17:37:43 <David_Orange> we should wait to have a clean install of B release 4 times, froze Bramaputra then add those new features
17:39:11 <David_Orange> narindergupta1: so i can add ipv6, sfc and so on, but it is not the way i would do it. But you are the boss :)
17:39:17 <narindergupta> David_Orange: if neutron l3 then it is already enabled in joid. we are testing it. And it is enalbed by default
17:39:28 <David_Orange> yes
17:39:58 <David_Orange> today we are loop testing odl at l2 and neutron at l3
17:40:01 <narindergupta> David_Orange: please add it and we will run if fails then won't release otherwise will get added
17:40:05 <David_Orange> this is my understanding
17:40:22 <narindergupta> and our tests are passing correct?
17:41:24 <David_Orange> today functest says: odl on joid pod2 = 100%, no ?
17:41:47 <narindergupta> i think morgan stated 98%
17:42:07 <David_Orange> odl_l3 is much more an option
17:42:15 <David_Orange> 98% is on tempest no ?
17:42:20 <narindergupta> overall
17:42:33 <narindergupta> tempotest only 3 failed test cases related to glance
17:42:54 <narindergupta> two are related to glance which needs to understand and one related to boot option.
17:43:07 <narindergupta> as per viktor manuall verification worked for both
17:43:18 <David_Orange> yes, but odl tests are all ok: 18 tests total, 18 passed, 0 failed
17:43:43 <narindergupta> yes odl all were passed
17:43:44 <David_Orange> so for odl we should not touch
17:44:02 <narindergupta> David_Orange: please add an option in ci and i will run that test and sure it will pass.
17:44:22 <David_Orange> odl_l3 option ?
17:44:34 <narindergupta> yes do it please
17:44:55 <narindergupta> and if neutron handles it then it is already there.
17:45:12 <narindergupta> basically same deployment have both l2 and l3 enabled then
17:45:24 <narindergupta> l2 by odl and l3 by neutron
17:48:12 <David_Orange> https://gerrit.opnfv.org/gerrit/9821
17:50:58 <David_Orange> narindergupta1: i have to go
17:51:14 <narindergupta> David_Orange: one change you need to pass -f
17:51:35 <David_Orange> it is passed line 147
17:52:03 <David_Orange> narindergupta1: is it ok ?
17:52:15 <narindergupta> y3es
17:52:25 <David_Orange> ok, good, see you tomorrow
17:52:39 <narindergupta> ok see you
17:53:38 <narindergupta> David_Orange: sorry -f is missing
17:54:08 <narindergupta> in line 147 and line 149
15:04:22 <David_Orange> narindergupta: hi
15:04:32 <narindergupta> David_Orange: hi
15:05:11 <David_Orange> morgan is asking if we are ok to set pod2 as CI pod until B official release ?
15:05:26 <narindergupta> David_Orange: yes i am +1 for it
15:05:30 <David_Orange> ok, nice
15:06:09 <narindergupta> David_Orange: for maas i have to do minor adjustment for dhcp and static ip address though/
15:06:09 <David_Orange> narindergupta: have you seen my mail about ODL l2switch
15:06:20 <narindergupta> still need to check
15:06:32 <David_Orange> okok
15:06:37 <David_Orange> and ok
15:07:26 <narindergupta> Davidregarding the feature needs to be enabled
15:08:02 <David_Orange> which one ?
15:08:10 <narindergupta> no 1
15:08:13 <David_Orange> #undo
15:08:16 <David_Orange> yes
15:08:23 <narindergupta> yeah we are enabling minimum now in Be
15:08:35 <narindergupta> David for l2 switch nice to know
15:08:55 <David_Orange> ok, so you are not enabling all features as in the doc, good
15:09:13 <David_Orange> i will have more feedback in 10 days
15:11:04 <David_Orange> and during my holidays, next week, if you need, you can send me a mail (if you want me to set the reverse proxy for public API access
15:11:52 <David_Orange> i will not answer it in the hour, but try to check some time
15:16:33 <narindergupta> David_Orange: sure david.
15:17:54 <narindergupta> David_Orange: also for crating the reverse proxy need to work with you. Lets try something which works for all
15:18:07 <David_Orange> yes of course
19:56:06 <narindergupta> bryan_att: i found the issue with keystone charm it seems i applied the patch to ha install but leftout nonha and i am fixinf it now
20:06:27 <narindergupta> bryan_att: fixed it you can give a retry
04:57:31 <narindergupta> yuanyou: hi
04:57:54 <yuanyou> narindergupta:hi
04:58:25 <narindergupta> yuanyou: i am still finding the installation issues with onos. Will you please look into it? We have holiday tomorrow but will restart the build once you will fix it
04:58:48 <narindergupta> yuanyou: this time it is config change on neutron-gateway
04:59:47 <yuanyou> narindergupta: I am working on this ,but I don't know how to fix it.
05:00:35 <narindergupta> what is the issue? Which script is failing?
05:01:10 <narindergupta> as i fixed other issue and those were due to extra space and not having proper config() definitin
05:01:56 <narindergupta> but other i have not idea how your team has implmented. Best way to look itnto the logs on the neutron-gateway unit and see what errors
05:02:00 <narindergupta> and try to resolv
05:02:13 <yuanyou> narindergupta: I only know config-changed failed,but i don't know  which line failed
05:02:35 <narindergupta> more errors can b find on failed unit
05:02:52 <narindergupta> and check /var/log/juju/unit-neutron-gateway-0.log
05:03:01 <narindergupta> on neutron-gateway unti
05:03:31 <yuanyou> narindergupta:yes, I am deploying on my own environment
05:03:42 <narindergupta> ok no problem
05:04:08 <narindergupta> meanwhile on pod5 can u remove the auto run of onos until this issue fixes
05:04:34 <narindergupta> and this is supposed to be stable ci lab but unfortunately its failing today on onos
05:04:45 <narindergupta> and you can use intel pod6 though for development
05:05:23 <yuanyou> narindergupta:ok,i will remove the auto run in releng
05:05:31 <narindergupta> thanks
14:57:09 <jose_lausuch> narindergupta: ping
14:57:17 <narindergupta> jose_lausuch: pomg
14:57:20 <narindergupta> whats up?
14:57:50 <narindergupta> jose_lausuch: io have installed gsutil into pod5 for joid and submit the patch in master branch
14:58:00 <narindergupta> will do same for other pods as well
14:58:49 <jose_lausuch> narindergupta: great
14:58:52 <jose_lausuch> fdegir: ping
14:59:08 <jose_lausuch> can you help to install gsutil with proper credentials? (I have no clue what's needed)
14:59:33 <narindergupta> yeah in orange pod5 i am seeeing upload error
14:59:49 <narindergupta> where gsutil was installed
15:01:04 <narindergupta> jose_lausuch: you were talking about image issue>
15:01:05 <narindergupta> >
15:01:06 <narindergupta> ?
15:01:20 <jose_lausuch> narindergupta: flavor issue
15:01:38 <jose_lausuch> https://build.opnfv.org/ci/view/functest/job/functest-joid-intel-pod5-daily-brahmaputra/45/console
15:01:38 <narindergupta> jose_lausuch: what was that?
15:01:50 <jose_lausuch> - ERROR - Flavor 'm1.small' not found.
15:01:59 <jose_lausuch> that is on joid intel pod 5
15:02:00 <jose_lausuch> but
15:02:38 <jose_lausuch> however
15:02:43 <jose_lausuch> if you look above
15:02:46 <narindergupta> but its there | 2  | m1.small  | 1     | 2048     |           | 20        | when we run
15:02:48 <jose_lausuch> Flavors for user `admin` in tenant `admin`:
15:02:48 <narindergupta> yeah
15:02:50 <jose_lausuch> yes
15:02:54 <jose_lausuch> so, that is strange
15:02:59 <jose_lausuch> and also, lookin below
15:02:59 <narindergupta> yeah
15:03:01 <jose_lausuch> for promise test
15:03:11 <jose_lausuch> Error [create_flavor(nova_client, 'promise-flavor', '512', '0', '1')]: ('Connection aborted.', BadStatusLine("''",))
15:03:11 <narindergupta> and by defaul t we do not deleted anything
15:03:15 <jose_lausuch> cann't create flavor
15:03:35 <narindergupta> that says connected aborted.
15:03:43 <narindergupta> why was it aborted?
15:03:46 <jose_lausuch> ya, casual..
15:03:48 <jose_lausuch> I dont know
15:03:59 <jose_lausuch> it's just creating a flavor with those specifications
15:04:07 <narindergupta> 2016-02-14 17:14:41,689 - vPing_userdata- INFO - Flavor found 'm1.small'
15:04:18 <jose_lausuch> requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
15:04:22 <jose_lausuch> yes, same error there
15:04:26 <jose_lausuch> that is strnage
15:05:02 <narindergupta> jose_lausuch: but i saw this passed in latest deployment
15:05:36 <jose_lausuch> this is even worse: https://build.opnfv.org/ci/view/functest/job/functest-joid-intel-pod5-daily-brahmaputra/44/console
15:05:44 <jose_lausuch> Error [create_glance_image(glance_client, 'functest-vping', '/home/opnfv/functest/data/cirros-0.3.4-x86_64-disk.img', 'True')]: <requests.packages.urllib3.connection.HTTPConnection object at 0x7efe66c3a710>: Failed to establish a new connection: [Errno 113] No route to host
15:06:19 <jose_lausuch> so, we are not having stable results
15:06:35 <narindergupta> jose_lausuch: and in pod6 it passed. https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod6-daily-master/54/consoleFull
15:07:11 <jose_lausuch> ya, that worked
15:07:17 <jose_lausuch> but I'm looking at stable/brahmaputra branch
15:07:22 <jose_lausuch> https://build.opnfv.org/ci/view/functest/job/functest-joid-intel-pod5-daily-brahmaputra/
15:07:25 <narindergupta> it looks like pointing me towards the pod stability as aborted connection is something called for networkign failure as
15:07:32 <jose_lausuch> yes
15:07:38 <jose_lausuch> looks like network failure or something
15:07:41 <narindergupta> there is no difference in branch though
15:07:49 <narindergupta> from installer prospective
15:08:46 <narindergupta> jose_lausuch: if you will see in intel pod5 https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod5-daily-brahmaputra/46/console
15:08:49 <narindergupta> it passed
15:08:57 <narindergupta> even in pod5
15:09:40 <jose_lausuch> narindergupta: yes, the flavor thing worked there...
15:09:40 <narindergupta> and temptest failed modt of time and not sure why? but same test passes in orange pod2
15:09:45 <jose_lausuch> but 172 failures in tempest.
15:10:28 <narindergupta> jose_lausuch: never got good results on intel pod5. Can you please help here., AS in oramge pod2 only 5 test cases failed
15:10:38 <narindergupta> and test results were 98%
15:10:47 <jose_lausuch> yes
15:10:52 <jose_lausuch> I know
15:11:02 <jose_lausuch> but why is this pod giving these bad results?
15:11:07 <jose_lausuch> and also the job was aborted
15:11:10 <jose_lausuch> while running
15:11:12 <narindergupta> david think it could be networking issue in pod
15:11:21 <narindergupta> do not know
15:11:34 <jose_lausuch> that could explain it
15:11:58 <narindergupta> jose_lausuch: but deployment always worked.
15:12:12 <jose_lausuch> taking a look at this:
15:12:13 <jose_lausuch> https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod6-daily-master/
15:12:26 <jose_lausuch> all the latests jobs are aborted.. (grey)
15:12:50 <narindergupta> yes
15:14:08 <narindergupta> jose_lausuch: i am seeing this aborted issue for long time and it never passes and not sure what it does
15:14:15 <narindergupta> may be you can debug.
15:14:30 <narindergupta> and time out is 210 minutes
15:15:07 <jose_lausuch> narindergupta: timeout for what?
15:15:13 <narindergupta> for job
15:15:21 <narindergupta> check the status
15:15:41 <narindergupta> jose_lausuch: Build timed out (after 210 minutes). Marking the build as aborted.
15:16:36 <jose_lausuch> narindergupta: ok.... something taking too long maybe...
15:17:03 <narindergupta> jose_lausuch: not sure it is stuck or taking too long
15:17:58 <jose_lausuch> narindergupta: ya, I see now in our jjob a timeout of 210 sec...
15:17:59 <jose_lausuch> wrappers:
15:17:59 <jose_lausuch> - build-name:
15:17:59 <jose_lausuch> name: '$BUILD_NUMBER Suite: $FUNCTEST_SUITE_NAME Scenario: $DEPLOY_SCENARIO'
15:17:59 <jose_lausuch> - timeout:
15:17:59 <jose_lausuch> timeout: 210
15:18:05 <jose_lausuch> mmmm
15:18:12 <narindergupta> ok
15:18:24 <jose_lausuch> but it is not normal that it takes that long...
15:19:37 <narindergupta> jose_lausuch: yes its not so need a debugging where it got stuck.
15:20:12 <narindergupta> also yardstick was working until last week on orange pod2 but not anymore
15:20:56 <jose_lausuch> ok
15:21:08 <jose_lausuch> Im trying to figure out what is getting stuck
15:22:41 <jose_lausuch> for example vIMS gets an error and takes almost 30 min
15:34:14 <jose_lausuch> narindergupta: I have detected that the Cinder tests that does Rally, take 45 min
15:34:36 <jose_lausuch> I have checked on other installers, and it takes only 20 min
15:35:06 <narindergupta> oh ok could be in intel pod5 and pod6 we do not have extra hard disk
15:35:21 <narindergupta> can u check on orang epod2 how much time it takes?
15:35:22 <jose_lausuch> narindergupta: same in orange pod 2, aroound 20 min
15:35:33 <jose_lausuch> and I guess the other rally tests will take longer too
15:35:45 <jose_lausuch> the normal functest runtime is around 2.5 hr
15:35:51 <jose_lausuch> this timedout after 3.5 hr...
15:35:55 <jose_lausuch> something is bad there
15:35:57 <narindergupta> that may be reason in intel pods no extra disk ssd disk so we are using the os disk
15:36:09 <jose_lausuch> I'll check other tests
15:36:14 <narindergupta> sure please
15:36:41 <narindergupta> specially network specific. In worst case we can increase the time oput and test
15:38:30 <jose_lausuch> for neutron test, normally it takes 10 min, and on intel pod 2 20+ minutes
15:38:38 <jose_lausuch> so everything is slowly there..
15:43:45 <narindergupta> intel pod2 ?
15:43:56 <narindergupta> so other pods also taking same time?
15:44:10 <narindergupta> not specific to pod5 and pod6?
15:45:40 <narindergupta> jose_lausuch: all pods i am seeing this https://build.opnfv.org/ci/view/joid/job/joid-verify-master/ do you think it is connectivity issue
15:45:41 <narindergupta> ?
15:46:02 <narindergupta> to linux foundation
15:47:09 <jose_lausuch> narindergupta: you mean this? pending—Waiting for next available executor on intel-us-build-1
15:47:28 <narindergupta> yeah
15:47:39 <jose_lausuch> I dont know...
15:47:49 <narindergupta> pod5, pod6 and orange pod2 shows the same status
15:49:55 <jose_lausuch> ya... strange
16:09:42 <narindergupta> jose_lausuch: anyway is temptest is time dependent?
16:10:32 <jose_lausuch> narindergupta: what do you mean
16:11:01 <narindergupta> i am seeing few test cases faied in intel pod5 and 6
16:11:21 <narindergupta> so just wondering some temptest times out as well and marked as failed
16:11:36 <narindergupta> just thinking loud?
16:12:33 <jose_lausuch> we can check the logs
16:12:47 <jose_lausuch> ah no, we cant, they are not pushed to artifacts
16:13:06 <jose_lausuch> we can check them on the container
16:13:32 <narindergupta> which container?
16:15:52 <jose_lausuch> the functest docker container on the jumphost
17:16:03 <jose_lausuch> narindergupta: ping
17:16:24 <narindergupta> jose_lausuch: pong
17:16:36 <jose_lausuch> narindergupta: can you tell me the IPs of the jumphost on intel-pod5 and 6?
17:16:46 <jose_lausuch> I have the vpn
17:16:50 <jose_lausuch> but dont know the ips
17:16:57 <narindergupta> 10.2.65.2 and 10.2.66.2
17:17:12 <narindergupta> [pd5 and pod6 but i need to add your ssh keys for access.
17:17:29 <narindergupta> give me your ssh public keys
17:17:54 <jose_lausuch> ok, let's do it with private chat
17:18:03 <narindergupta> sure
17:18:47 <jose_lausuch> ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9I5Pyg3tND4sV9EoEW3jCqY+91IOdkNCAe8xscI1mRcVLlN7/0YHFCQLFX8q+lTAMWhguMkoUf4y0w6rMmDE0c59XKUNYUPHMPT6vEBbHz9JjCZdEhHGouDJmSAxS0PBLrv+nj+P9fhFVUdxf+pWzaCID8zZDfBq2k7KGtyREmV/l1jkIatDUjh5Hj0lenkYvH85nrAQaAWa3WoLieqj8Ve9ruoguhV6/IjWbUtU+JX/9FLyn10Wq+ArySIbDbwD2ajI//4E1XPDfztzjsscU3sSUw9vwaP78/1XOHPKeEvgd1UBIG4TzaTuRgLmsTtWar409sZ8QsPkE2CwkS4OB ejolaus@ejolaus-dev
17:18:52 <jose_lausuch> oops, sorry :)
17:19:04 <narindergupta> no worrues
17:19:40 <narindergupta> ok now you can try using 10.2.66.2 intel pod6
17:19:45 <narindergupta> with user jenki
17:19:53 <narindergupta> sorry user jenkins
17:21:17 <jose_lausuch> can I try first the pod5? I already have that vpn opened
17:21:29 <narindergupta> ok just a moment then
17:23:26 <narindergupta> please try now
17:23:40 <jose_lausuch> narindergupta: I opened vpn also for pod6, and it works
17:23:42 <jose_lausuch> Im in, thanks!
17:23:49 <narindergupta> cool
17:25:09 <jose_lausuch> I will run some test on pod5, is that ok?
17:25:29 <jose_lausuch> is the deployment up?
17:26:21 <jose_lausuch> root@ce0974c487f4:~# neutron net-list
17:26:21 <jose_lausuch> Unable to establish connection to http://10.4.1.27:9696/v2.0/networks.json
17:26:33 <jose_lausuch> what is this? that address is not showed in the endpoint list
17:27:31 <narindergupta> just now checked looks like onos tried to installed ans habing issue with neutron
17:27:55 <narindergupta> i can retart the odl deployment might take an hour if you are ok with that
17:27:55 <narindergupta> ?
17:28:17 <jose_lausuch> narindergupta: ok, but I might check tomorrow..
17:28:34 <narindergupta> overnight job will on that
17:28:53 <narindergupta> and install will override again as this is ci pods
17:29:24 <narindergupta> or you can look into it tomorrow morning your time
17:29:51 <narindergupta> as onos job is scheduled at 8:00 AM CST i believe
17:31:25 <jose_lausuch> ok
03:54:27 <narindergupta> yuanyou: hi
03:54:57 <narindergupta> yuanyou: i have send you an information and it seems you need to do charm sync of neutron-api-onos as well
04:53:27 <yuanyou> narindergupta	： yes,I saw it ,and I am test it .
04:53:42 <narindergupta> yuanyou: ok
04:54:16 <narindergupta> yuanyou: thanks also it is good habit to to do charmsync for your charms in case you are taking it from openstack cahrms
04:56:40 <yuanyou> narindergupta: yes ,that will be fine
04:57:56 <narindergupta> yuanyou: currently there won't be any onos build op into pod5 as functest team need this for debugging temptest failures. But we can use intel pod6 once you are able to correct the charm sucessfully?
05:01:24 <yuanyou> narindergupta: yes, I see
13:11:46 <jose_lausuch> narinderg_cfk: ping when you are up
13:30:07 <narinderg_cfk> hi jose_lausuch
13:30:28 <jose_lausuch> I so you aborted a job after 4 hors..
13:30:31 <jose_lausuch> hours
13:31:04 <narinderg_cfk> jose_lausuch: yesternight i did not abort
13:32:02 <narinderg_cfk> jose_lausuch: and intel pod5 was success daily
13:32:14 <narinderg_cfk> https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod5-daily-brahmaputra/52/console
13:32:53 <narinderg_cfk> jose_lausuch: so does for intel pod6  https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod6-daily-master/55/console but need to see the failed test cases
13:34:23 <narinderg_cfk> jose_lausuch: but yardstick test cases failed
13:42:52 <narinderg_cfk> jose_lausuch: it seems increasing timeout completed atleast the test in all labs.
13:43:06 <jose_lausuch> narinderg_cfk: I mean this one, the previous one
13:43:06 <jose_lausuch> https://build.opnfv.org/ci/view/joid/job/functest-joid-intel-pod5-daily-brahmaputra/51/console
13:43:15 <jose_lausuch> narinderg_cfk: anyway, the blue ball took 5 hr...
13:43:38 <jose_lausuch> narinderg_cfk: the problem is cinder
13:43:43 <narinderg_cfk> jose_lausuch: correct. yes above aborted because i wanted to run overnight builkd
13:43:46 <jose_lausuch> | cinder            | 52:41      | 50            | 98.82%    |
13:44:03 <narinderg_cfk> ok one hour
13:44:42 <jose_lausuch> I will compare this
13:44:46 <jose_lausuch> with another installer
13:45:30 <narinderg_cfk> in orange pod2 only 19.31
13:45:49 <narinderg_cfk> | cinder            | 19:31      | 50            | 100.00%   |
13:46:11 <narinderg_cfk> but they have ssds
13:47:06 <narinderg_cfk> jose_lausuch: but heat test cases are failing on all pods | heat              | 03:52      | 2             | 7.69%     |
13:47:42 <jose_lausuch> narinderg_cfk:  http://pastebin.com/raw/kFdKVDRQ
13:47:51 <jose_lausuch> look at keystone
13:47:58 <jose_lausuch> took 3 hours... that is the real problem
13:48:00 <jose_lausuch> and cinder 1 hr
13:49:19 <narinderg_cfk> i think all service took extra time
13:49:33 <jose_lausuch> for cinder I can understand that there are not SSDs
13:49:40 <narinderg_cfk> correct
13:49:40 <jose_lausuch> but for keystone??
13:49:47 <jose_lausuch> 3 hrs??
13:49:54 <jose_lausuch> there is a network issue for sure
13:50:38 <narinderg_cfk> yeah looks like
13:51:06 <narinderg_cfk> jose_lausuch: how to figure it out?
13:51:28 <jose_lausuch> narinderg_cfk: I'm trying to login to the pod, but I am not sure how to start troubleshooting this...
13:53:16 <narinderg_cfk> jose_lausuch: i think we need help
13:55:47 <narinderg_cfk> jose_lausuch: lkets discuss in pharos channel please join opnfv-pharos
14:33:55 <narindergupta> jose_lausuch: ok so manual working of openstack is fast enough but performance detriot during test. can u check the logs on thesystem what point it took more time
14:36:28 <jose_lausuch> narindergupta: sorry, need to waity, Im reporting in the release meetning
15:46:04 <narindergupta> jose_lausuch: i am back again sorry it was power outage.
15:46:24 <jose_lausuch> narindergupta:  no prob
15:49:23 <narindergupta> catbus1: hi
15:53:23 <jose_lausuch> narindergupta: Im back on intel pod 5
15:53:27 <jose_lausuch> running test by test
15:53:42 <narindergupta> jose_lausuch: ok
15:58:17 <jose_lausuch> we have yet another problem
15:59:20 <narindergupta> whats the issue?
16:00:50 <jose_lausuch> now vping doesnt work
16:01:57 <jose_lausuch> narindergupta: JOID uses the same network segment for public and admin network...
16:02:28 <narindergupta> jose_lausuch: there are different network
16:02:43 <jose_lausuch> if I do keystone endpoint-list
16:03:03 <narindergupta> jose_lausuch: yeah for endlist we are defining the public segment seperate
16:03:06 <jose_lausuch> http://hastebin.com/odovurufog.sm
16:03:10 <narindergupta> its all on admin network
16:03:16 <jose_lausuch> 10.4.1.0/24 for all
16:03:22 <jose_lausuch> ah
16:03:37 <narindergupta> for endpoints only
16:03:38 <jose_lausuch> I wonder if that could also be an issue
16:04:10 <jose_lausuch> is there a joid gui of the deployment?
16:04:20 <narindergupta> yes it is on admin entwork
16:04:42 <jose_lausuch> is there later on isolation for storage/public/admin networks ?
16:05:48 <narindergupta> there is isolation of data, public and admin
16:05:52 <jose_lausuch> ok, I see that 10.2.65.0/24 is the public range
16:06:00 <narindergupta> correct
16:06:02 <jose_lausuch> ok
16:06:09 <jose_lausuch> then, ignore my comment :)
16:06:12 <narindergupta> :)
16:06:25 <jose_lausuch> vping failed
16:06:38 <jose_lausuch> I will run it again and not clean the instances
16:06:39 <narindergupta> whats the reason?
16:06:43 <jose_lausuch> so that you can login to pod5 and check
16:06:47 <jose_lausuch> cannot ping the floating ip
16:07:33 <narindergupta> which port flatin ip created?
16:07:40 <narindergupta> ext-net or somewhere else
16:08:08 <jose_lausuch> ya
16:08:15 <jose_lausuch> can you login to the deployment?
16:10:08 <narindergupta> yeah i am logging in
16:10:12 <jose_lausuch> nova list
16:10:19 <jose_lausuch> and then you'll see there are 2 VMs
16:10:23 <jose_lausuch> one of them with a floating ip
16:10:40 <jose_lausuch> you can check too neutron floatingip-list
16:14:28 <narindergupta> which subnet?
16:14:54 <narindergupta> let me create the router as i used to do and retry
16:16:12 <jose_lausuch> narindergupta: there is already a router
16:18:01 <narindergupta> jose_lausuch: i cna not figure out the n why ping is not working. I know when i test is after fresh install it works
16:18:20 <jose_lausuch> I Assigned a new floating ip to the first vm
16:18:22 <jose_lausuch> and that is pingable
16:18:38 <narindergupta> hun thats wiered
16:18:56 <jose_lausuch> very
16:19:03 <jose_lausuch> do a nova list
16:19:09 <jose_lausuch> I will remove floating ip from vm 2
16:19:12 <jose_lausuch> and assign it again
16:19:42 <narindergupta> yeah i can see
16:19:59 <narindergupta> .84 pings but not .83
16:21:48 <jose_lausuch> narindergupta: nova console-log opnfv-vping-2
16:22:46 <narindergupta> jose_lausuch: i am in
16:22:54 <jose_lausuch> nova console-log opnfv-vping-2|grep 'ifconfig'  -A 16
16:23:00 <jose_lausuch> the second VM didnt get the ip
16:23:33 <jose_lausuch> udhcpc (v1.20.1) started
16:23:33 <jose_lausuch> Sending discover...
16:23:33 <jose_lausuch> Sending discover...
16:23:33 <jose_lausuch> Sending discover...
16:23:33 <jose_lausuch> Usage: /sbin/cirros-dhcpc <up|down>
16:23:34 <jose_lausuch> No lease, failing
16:23:34 <jose_lausuch> WARN: /etc/rc3.d/S40-network failed
16:23:35 <jose_lausuch> cirros-ds 'net' up at 181.97
16:23:35 <jose_lausuch> checking http://169.254.169.254/2009-04-04/instance-id
16:25:16 <narindergupta> jose_lausuch: could it be dhcp issue?
16:25:29 <jose_lausuch> I dont know
16:25:31 <jose_lausuch> Im trying another thing
16:27:46 <narindergupta> ok
16:27:56 <jose_lausuch> how can I access horizon?
16:28:51 <narindergupta> you can do X redirect through ssh
16:29:07 <narindergupta> and start the firefox on the jumphost
16:29:40 <narindergupta> there is vncserver on 10.4.0.255 as well
16:29:56 <narindergupta> password is ubuntu
16:30:21 <narindergupta> vip for dashboard is 10.4.1.21
16:30:29 <jose_lausuch> narindergupta:  the second VM doesnt get the ip from the dhcp...
16:30:48 <narindergupta> but floating ip pings?
16:31:42 <narindergupta> also nova lsit shows | ffc42fc4-065f-4cfe-8276-cbbb126752be | opnfv-vping-2 | ACTIVE | -          | Running     | vping-net=192.168.130.4, 10.2.65.87 |
16:31:59 <narindergupta> which means it got the ip somehow.
16:32:25 <narindergupta> assigned but dhco looks like not giving the ip on request.
16:32:58 <jose_lausuch> narindergupta: I use port forwarding, so I open firefox on my local  env :)
16:33:32 <jose_lausuch> narindergupta: assigning the ip is not a problem, but if you check the console-log you'll see that dhcp doesnt work
16:33:44 <jose_lausuch> narindergupta: what is the user/password for horizon?
16:34:13 <narindergupta> admin openstack
16:34:47 <jose_lausuch> ok thanks, taht works
16:34:51 <narindergupta> Sending discover...
16:34:51 <narindergupta> Usage: /sbin/cirros-dhcpc <up|down>
16:34:51 <narindergupta> No lease, failing
16:34:51 <narindergupta> WARN: /etc/rc3.d/S40-network failed
16:34:51 <narindergupta> cirros-ds 'net' up at 181.25
16:35:10 <narindergupta> do you think it could be image issue?
16:35:17 <jose_lausuch> why image?
16:35:21 <jose_lausuch> the first VM gets the ip correctly
16:35:37 <narindergupta> it gives the usage error
16:36:16 <jose_lausuch> nova console-log opnfv-vping-1|grep 'Starting network...' -A 10
16:36:23 <jose_lausuch> usage?
16:36:53 <narindergupta> Usage: /sbin/cirros-dhcpc <up|down>
16:36:53 <narindergupta> (10:34:49 AM) narindergupta: No lease, failing
16:37:59 <narindergupta> let me check on neutron-gateway node
16:41:21 <jose_lausuch> ok
16:42:26 <narindergupta> dhcp lease does not show ip its there for .3
16:42:31 <narindergupta> but not for .4
16:42:48 <jose_lausuch> that's bad :)
16:43:06 <narindergupta> i know looks like request did not reached to dhcp
16:44:51 <narindergupta> jose_lausuch: even neutron logs no sign of .4 while .3 its there
16:45:03 <narindergupta> do you know mac address of the interface i can search
16:45:21 <narindergupta> for 2nd vm
16:45:27 <jose_lausuch> yes
16:45:57 <jose_lausuch> eth0      Link encap:Ethernet  HWaddr FA:16:3E:57:24:56
16:47:11 <narindergupta> no request with this mac
16:48:27 <narindergupta> can u try to create one more vm?
16:48:42 <narindergupta> lets see how does it behave with the same interface?
16:48:43 <jose_lausuch> yes
16:50:07 <jose_lausuch> narindergupta: done
16:50:12 <narindergupta> ok i am capturing the dhcp agent log
16:50:18 <jose_lausuch> called test-vm
16:50:41 <jose_lausuch> Starting network...
16:50:41 <jose_lausuch> udhcpc (v1.20.1) started
16:50:41 <jose_lausuch> Sending discover...
16:51:57 <jose_lausuch> again sending discover
16:52:01 <jose_lausuch> no leases??
16:53:46 <narindergupta> i am seeing this in one of dhcp agent log 2016-02-16 07:32:40.432 12656 ERROR neutron.agent.dhcp.agent RemoteError: Remote error: IpAddressGenerationFailure No more IP addresses available on network 303fd1aa-10fd-4f73-b8c1-475fdd8f0a09.
16:53:46 <narindergupta> not now but issue was seen earlier
16:53:47 <jose_lausuch> Sending discover...
16:53:47 <jose_lausuch> Sending discover...
16:53:47 <jose_lausuch> Sending discover...
16:53:47 <jose_lausuch> Usage: /sbin/cirros-dhcpc <up|down>
16:53:47 <jose_lausuch> No lease, failing
16:53:47 <jose_lausuch> WARN: /etc/rc3.d/S40-network failed
16:53:57 <jose_lausuch> aha!
16:54:00 <jose_lausuch> interesting
16:54:01 <narindergupta> looks like some how related
16:54:24 <jose_lausuch> but it doesnt make sense
16:54:31 <jose_lausuch> | vping-subnet    | 192.168.130.0/24 | {"start": "192.168.130.2", "end": "192.168.130.254"} |
16:54:36 <jose_lausuch> the range is quite wide!
16:55:16 <narindergupta> can u check whether on dashboard dhc agent services are up
16:55:17 <narindergupta> ?
16:55:29 <jose_lausuch> node6-control 	Enabled 	Up
16:55:30 <jose_lausuch> yes
16:55:56 <narindergupta> yeah it matches here on the node
16:56:02 <jose_lausuch> (d4c55165-70d2)
16:56:02 <jose_lausuch> 
16:56:02 <jose_lausuch> 192.168.130.2
16:56:02 <jose_lausuch> network:dhcp 	Active 	UP
16:56:05 <jose_lausuch> that is the port
17:00:27 <narindergupta> hun so all services are up ports are up
17:01:54 <jose_lausuch> yes
17:03:08 <narindergupta> i think its worth to clear the network including bridges and router and recreate it again and check some time due to no leases available its stuck there and is waiting for leases to avilable
17:03:28 <jose_lausuch> ok
17:03:30 <jose_lausuch> I will do that
17:04:49 <narindergupta> thanks
17:06:12 <jose_lausuch> done
17:06:19 <jose_lausuch> I will run the same test, but on a different network
17:06:25 <jose_lausuch> 192.168.40.0/24 for example
17:09:00 <jose_lausuch> narindergupta: can you check again?
17:09:17 <jose_lausuch> narindergupta: it worked...
17:09:33 <narindergupta> Feb 16 17:08:59 node6-control dnsmasq-dhcp[2860]: DHCPACK(ns-f2fff5fd-05) 192.168.140.4 fa:16:3e:f7:06:3e host-192-168-140-4
17:09:54 <narindergupta> yeah i can verify in syslog i am getting dhcp assigned ip
17:10:09 <jose_lausuch> now the VMs are trying to ping each other
17:10:14 <narindergupta> ok
17:12:39 <jose_lausuch> something is slow or doesnt work
17:14:08 <narindergupta> whats happening
17:14:21 <jose_lausuch> not sure, the test is hanging at some point
17:14:24 <jose_lausuch> I will abort it
17:14:59 <narindergupta> their ping time was 130 s
17:15:06 <narindergupta> earlier in the lab
17:16:12 <jose_lausuch> ya, not sure
17:16:17 <jose_lausuch> if I run it manually it works
17:16:29 <jose_lausuch> ya
17:16:33 <jose_lausuch> so what was the problem?
17:16:36 <jose_lausuch> I would like to retest it
17:16:42 <jose_lausuch> with the same network
17:23:02 <jose_lausuch> narindergupta: you know what?
17:23:05 <jose_lausuch> now it doesnt work
17:23:23 <jose_lausuch> vm2 doesnt get an ip from dhcp
17:23:39 <jose_lausuch> how is that possible?
17:23:49 <jose_lausuch> I removed and created the network again, with the same range
17:23:56 <jose_lausuch> the first vm got an ip
17:24:03 <jose_lausuch> but same issue with vm2
17:24:52 <jose_lausuch> narindergupta: I am creating another VM manually
17:24:57 <jose_lausuch> can you check the dhcp logs?
17:40:57 <narindergupta> no sign of ip
17:41:11 <narindergupta> jose_lausuch: no sign of new ip
17:41:56 <narindergupta> but i have following host listed
17:42:08 <narindergupta> fa:16:3e:85:08:81,host-192-168-140-1.openstacklocal.,192.168.140.1
17:42:08 <narindergupta> fa:16:3e:98:97:81,host-192-168-140-2.openstacklocal.,192.168.140.2
17:42:08 <narindergupta> fa:16:3e:3d:99:fa,host-192-168-140-3.openstacklocal.,192.168.140.3
17:42:08 <narindergupta> fa:16:3e:11:53:88,host-192-168-140-4.openstacklocal.,192.168.140.4
17:42:08 <narindergupta> fa:16:3e:6b:8c:a6,host-192-168-140-5.openstacklocal,192.168.140.5
17:43:27 <jose_lausuch> that is strange
17:44:43 <narindergupta> yeah in leases file its not there but in host file it exist
17:45:48 <narindergupta> and i have liberty version of neutron-dhcp-agent
17:46:06 <jose_lausuch> what ODL version is that?
17:46:11 <narindergupta> Be
17:46:47 <narindergupta> i think we should try the same test on orange pod2 in case something works there?
18:28:36 <narindergupta> catbus1: any update on the user guide?
02:16:00 <bryan_att> narindergupta: I found clues as to why MAAS is not creating the nodes. See the log entry pasted below
02:16:47 <bryan_att> https://www.irccloud.com/pastebin/tpmje5Ft/
02:17:46 <bryan_att> narindergupta: when I change the parameter power_parameters_power_address to power_address and send the same command from the shell, it works.
02:18:18 <bryan_att> controlnodeid=`maas maas nodes new autodetect_nodegroup='yes' name='node1-control' tags='control' hostname='node1-control' power_type='virsh' mac_addresses=$node1controlmac power_address='qemu+ssh://'$USER'@192.168.122.1/system' architecture='amd64/generic' power_parameters_power_id='node1-control' | grep system_id | cut -d '"' -f 4 `
02:19:21 <bryan_att> Above is the change I tried for this in 02-maasdeploy.sh, which was the only place I saw this parameter name... but that did not change what was sent. So there is definitely a bug somewhere still.
02:20:16 <bryan_att> note also that resending the command as above still did not set eth1 to auto on the controller... I had to do that manually
04:07:57 <narindergupta> yuanyou:
04:08:29 <narindergupta> i am not seeing any change in charms related to neutron-api-onos for charm syncer
04:41:03 <narindergupta> yuanyou: on intel pod6 i can verify that after doing charm sync we were able to create the network.
06:12:56 <yuanyou> narindergupta: I had synchronized in my local environment, but there is some error ,so I don't commit the changes, and I should have more tests.
15:56:25 <narindergupta> bryan_att: on your intel NUCS can you try to redeploy as i beleive nonha issue with keystone should be fixed now.
15:57:01 <bryan_att> ok, the last deploy timed out last night. Did you see my notes from yesterday on the power_parameters?
16:03:58 <bryan_att> narindergupta: I have the joid-walk meeting today at 1PM PST with your team. I'd like to get a successful deploy before then. I can restart the JuJu deploy now but I think the issues I reported last night would be good to address asap also. I think we can fix whatever issue is requiring me to manually create the machines in MAAS.
16:28:42 <bryan_att> narindergupta: I just recloned the repo and am restarting the maas deploy.
16:28:54 <narindergupta> thanks
16:29:04 <narindergupta> bryan_att: you still need to add the nodes manually
16:29:27 <bryan_att> narindergupta: did you see my earlier note about the bug I found?
16:29:36 <narindergupta> i am working with maas-deployer team on building the maas-deployer so that fixes will be avialalbe soon
16:29:56 <narindergupta> no not yet
16:30:03 <narindergupta> i still have to check
16:32:38 <bryan_att> I can get the nodes created through the command "ssh -i /home/opnfv/.ssh/id_maas -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o LogLevel=quiet ubuntu@192.168.10.3 maas maas nodes new autodetect_nodegroup='yes' name='node2-compute' tags='compute' hostname='node2-compute' power_type='ether_wake' mac_addresses='B8:AE:ED:76:C5:ED'
16:32:38 <bryan_att> power_address='B8:AE:ED:76:C5:ED' architecture='amd64/generic'"
16:33:04 <bryan_att> when I change the parameter power_parameters_power_address to power_address and send the same command from the shell, it works.
16:51:00 <narindergupta> ok
16:58:34 <narindergupta> yeah i made changes in maas-deployer accordingly and hopefuly next builde of maas-deplopyer will fix
17:05:01 <collabot> arturt: Error: Can't start another meeting, one is in progress.  Use #endmeeting first.
17:05:19 <arturt> #endmeeting