15:05:45 #startmeeting weekly integration meeting 15:05:45 Meeting started Thu Sep 11 15:05:45 2014 UTC. The chair is CASP3R. Information about MeetBot at http://ci.openstack.org/meetbot.html. 15:05:45 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:05:45 The meeting name has been set to 'weekly_integration_meeting' 15:06:11 #chair LuisGomez catohornet 15:06:11 Current chairs: CASP3R LuisGomez catohornet 15:09:21 #topic project update 15:09:40 #info all testing good for base_of13 (OSGi) 15:10:01 #info karaf all testing is all good but 1 test failing around topo in performance 15:26:19 #info to get another Robot VM we have to move to rackspace (static) lab 15:34:04 Priyanka Chopra proposed a change to integration: Adding plugin2oc features https://git.opendaylight.org/gerrit/11036 15:36:03 @chrisprice - ping rex when this meeting wraps up 15:36:35 rexpugh Chris Price doesn't hang around her 15:36:44 This is Chris O'Shea 15:37:28 thanks chris O 15:53:33 A change was merged to integration: Updating and fixing xmls https://git.opendaylight.org/gerrit/11045 17:57:22 Abhishek Kumar proposed a change to integration: Basic recovery scripts https://git.opendaylight.org/gerrit/11064 21:04:51 Priyanka Chopra proposed a change to integration: Adding plugin2oc features https://git.opendaylight.org/gerrit/11036 00:11:16 Priyanka Chopra proposed a change to integration: Adding plugin2oc features https://git.opendaylight.org/gerrit/11036 00:31:46 Christopher O'Shea proposed a change to integration: Adding plugin2oc features https://git.opendaylight.org/gerrit/11077 01:39:07 Christopher O'Shea proposed a change to integration: Adding plugin2oc features https://git.opendaylight.org/gerrit/11077 01:50:28 Carol Sanders proposed a change to integration: Adding NetCONF Test Suite https://git.opendaylight.org/gerrit/11080 01:50:38 Christopher O'Shea proposed a change to integration: Adding plugin2oc features https://git.opendaylight.org/gerrit/11077 02:30:34 Kamal Rameshan proposed a change to integration: robot integration tests for router rpc in datastore clustering https://git.opendaylight.org/gerrit/11081 02:31:19 Hideyuki Tai proposed a change to integration: Added VTN Coordinator to Karaf distribution. https://git.opendaylight.org/gerrit/11082 02:33:19 Carol Sanders proposed a change to integration: Adding changes to NETCONF Test Suite https://git.opendaylight.org/gerrit/11083 05:16:58 Rafat Jahan proposed a change to integration: Adding sdninterfaceapp features https://git.opendaylight.org/gerrit/11086 07:09:32 Rafat Jahan proposed a change to integration: Karaf built and integrated https://git.opendaylight.org/gerrit/11089 08:45:58 good morning folks 08:47:40 Peter Gubka proposed a change to integration: Updating xmls for flows to have unique flow id https://git.opendaylight.org/gerrit/11091 09:40:42 Rafat Jahan proposed a change to integration: Adding sdninterfaceapp features https://git.opendaylight.org/gerrit/11093 09:55:00 Rafat Jahan proposed a change to integration: Adding sdninterfaceapp features https://git.opendaylight.org/gerrit/11094 10:20:03 Peter Gubka proposed a change to integration: A test which connects 256 switches. https://git.opendaylight.org/gerrit/11095 15:19:00 A change was merged to integration: Adding plugin2oc features https://git.opendaylight.org/gerrit/11077 15:55:19 Rafat Jahan proposed a change to integration: Adding sdninterfaceapp features https://git.opendaylight.org/gerrit/11111 16:31:42 Rafat Jahan proposed a change to integration: Adding sdninterfaceapp features https://git.opendaylight.org/gerrit/11112 16:34:55 Rafat Jahan proposed a change to integration: Adding sdninterfaceapp features https://git.opendaylight.org/gerrit/11094 16:55:11 Basheeruddin Ahmed proposed a change to integration: Initial isolation test integrated with robot framework Usage: pybot -v LEADER: -v PORT:8080 -v FOLLOWER1: -v FOLLOWER2: ~/integration/test/csit/suites/clustering/datastore/basic/ Adding start/stop controller ut https://git.opendaylight.org/gerrit/10959 16:55:12 Basheeruddin Ahmed proposed a change to integration: renamed the test cases file name to contain proper sequence number https://git.opendaylight.org/gerrit/11026 16:55:13 Basheeruddin Ahmed proposed a change to integration: Added library to determine Shard Cluster Roles https://git.opendaylight.org/gerrit/11016 16:55:14 Basheeruddin Ahmed proposed a change to integration: UtilLibrary now uses requests session instead of direct post /get which seem to open active connections https://git.opendaylight.org/gerrit/11113 17:54:49 Rafat Jahan proposed a change to integration: Adding sdninterfaceapp features https://git.opendaylight.org/gerrit/11117 18:17:34 Hi, committers of Integration Group. I would like you to revew and approve my patch before RC1. https://git.opendaylight.org/gerrit/#/c/11082/ 18:34:53 A change was merged to integration: Added VTN Coordinator to Karaf distribution. https://git.opendaylight.org/gerrit/11082 19:56:50 Jamo Luhrsen proposed a change to integration: INPORT action test now functional. There is a bug open to move that action to IN_PORT, but that may or may not be resolved. Depending on that, this test case may have to revert back to IN_PORT (bug is https://bugs.opendaylight.org/show_bug.cgi?id=1725) https://git.opendaylight.org/gerrit/11118 23:22:34 Priyanka Chopra proposed a change to integration: Adding plugin2oc features https://git.opendaylight.org/gerrit/11129 06:48:44 Reminder: the integration jenkins will be going offline in less than 15 minutes for a 4 hour outage. 06:50:22 yea that l2switch was just testing something 06:51:08 I'll let the current test pass through, but I'm disabling the polling and gerrit trigger jobs now so no more should come into the system. Do not trigger any more manual tests 06:55:34 LuisGomez hey do you want to abandon that job cause the compatible-min will take 30 mins 06:56:22 right, we can drop it 06:57:27 ok done. 06:57:42 thank you :) 06:58:13 alright i'm out, have a good change window :P 06:58:21 thanks, get some sleep ;) 10:35:07 jenkins silo is back online, it's currently running a bit sluggish though. I can't fix it at the moment because of an API outage at Rackspace. 18:57:54 LuisGomez: any idea why the failed tests in that latest build are trying to pull a SNAPSHOT artifact from the release repos? 18:59:16 really? 18:59:20 let me see that 19:01:03 yeah, we're running into it with the move to the new environment because to help with artifact movement around the environments we use a different nexus server (proxy to our main one which is in a different DC). As such all artifact retrieval is forced to that repo for either stuff out of the master release view or out of the snapshot repo... 19:01:27 ok, i think integration PAX-EXAM fails because of this line: 19:01:29 2014-09-13 18:51:18,818 | WARN | n(3)-10.30.11.17 | AetherBasedResolver | 5 - org.ops4j.pax.url.mvn - 1.6.0 | Error resolving artifactorg.opendaylight.snmp4sdn:plugin-shell:jar:0.1.3-SNAPSHOT:Could not find artifact org.opendaylight.snmp4sdn:plugin-shell:jar:0.1.3-SNAPSHOT in nexus-release-mirror (http://nexus01.dfw.opendaylight.org:8081/nexus/content/groups/public/) 19:01:30 org.sonatype.aether.resolution.ArtifactResolutionException: Could not find artifact org.opendaylight.snmp4sdn:plugin-shell:jar:0.1.3-SNAPSHOT in nexus-release-mirror (http://nexus01.dfw.opendaylight.org:8081/nexus/content/groups/public/) 19:01:43 is this what you are saying? 19:01:48 yes, but that's the release view repo and not the snapshot repo 19:01:56 ok 19:01:57 of course it won't find a SNAPSHOT artifact there 19:02:27 so i need ed to figure out why this project fetches from wrong place 19:02:56 LuisGomez: the forced repos configuration we're using looks a bit like what's defined at the bottom of this: https://wiki.opendaylight.org/view/Infrastructure:Nexus 19:03:35 you'll see that we basically say, unless the artifact is supposed to be pulled from opendaylight.snapshot look at our release meta repo 19:04:59 I could disable the forced repo configs (like it was in LF) but it would a) have to traverse 1/2 the continent to get resources (adding time) and b) wouldn't let us find things like this that are somehow broken ;) 19:05:54 ok, just for me to understand this is an issue in the snmp project pom file right? 19:06:21 umm... I have to assume so since it's an smp4sdn component 19:06:30 i think so 19:06:52 edwarnicke, are you there? 19:07:11 the thing is, their build silo has been under this restriction for some time, so unless they aren't doing some testing that would have exposed it, it should have already been fixed 19:07:14 LuisGomez: Yes :) 19:07:47 integration build fails because snmp is fetching artifact from wrong place apparently 19:08:02 2014-09-13 18:51:18,818 | WARN | n(3)-10.30.11.17 | AetherBasedResolver | 5 - org.ops4j.pax.url.mvn - 1.6.0 | Error resolving artifactorg.opendaylight.snmp4sdn:plugin-shell:jar:0.1.3-SNAPSHOT:Could not find artifact org.opendaylight.snmp4sdn:plugin-shell:jar:0.1.3-SNAPSHOT in nexus-release-mirror (http://nexus01.dfw.opendaylight.org:8081/nexus/content/groups/public/) 19:08:03 org.sonatype.aether.resolution.ArtifactResolutionException: Could not find artifact org.opendaylight.snmp4sdn:plugin-shell:jar:0.1.3-SNAPSHOT in nexus-release-mirror (http://nexus01.dfw.opendaylight.org:8081/nexus/content/groups/public/) 19:08:05 build isn't failing... it's unstable ;) tests failing 19:08:06 * edwarnicke reads the back thread 19:08:18 correct PAX-EXAM does not pass 19:08:28 build is OK 19:08:31 LuisGomez: So the wiring tests are failing? 19:08:38 yes 19:08:41 in integration/features/ ? 19:08:47 let me post the console 19:08:54 LuisGomez: Thanks :) 19:09:01 https://jenkins.opendaylight.org/integration/view/Polling%20Jobs/job/integration-master-project-centralized-integration/2385/org.opendaylight.integration$features-integration/testReport/installFeature%28org.opendaylight.yangtools.featuretest.SingleFeatureTest%29%5BrepoUrl_%20file__opt_jenkins-integration_workspace_integration-master-project-centralized-integration_features_target_classes_features.xml,%20Feature_%20odl-integra 19:09:01 https://jenkins.opendaylight.org/integration/view/Integration%20jobs/job/integration-master-project-centralized-integration/2386/consoleFull 19:09:15 mine just links to one of the errors ;) 19:09:55 Ah 19:09:56 tykeal, your link does not work 19:09:56 * tykeal notes that the artifact resolution is failing since it's trying to pull a SNAPSHOT artifact from a release repo 19:09:56 OK 19:10:01 This isn't a wrong repo thing 19:10:21 Or at least I am pretty sure its not 19:10:48 I believe this is a 'SNMP4SDN added a new bundle to their features.xml file and not their features/pom.xml 19:10:49 ' 19:10:50 problems 19:10:53 I verified that the artifact(s) in question do exist in the nexus01 proxy in the opendaylight.snapshot repo 19:11:03 https://git.opendaylight.org/gerrit/#/c/11133/ 19:11:30 tykeal: I would expect they do 19:11:35 tykeal: Let me explain what's happening 19:11:44 ah, and their job wouldn't fail because they were actually building the artifacts in question 19:12:04 there you go 19:12:25 karaf looks for artifacts in the local .m2 or in well known repos like central (or I think in places defined as release repos in settings.xml) 19:12:25 in other words, we caught an actual error... 19:12:31 tykeal: YES :) 19:12:46 While I would prefer *not* to use integration as a test case for this problem 19:12:58 the snmp folks are active now? 19:13:03 I have not had the time (and probably won't have the time) to write the test that locks this one down 19:13:03 ) 19:13:10 ok, was just seriously worried since all the runs to completion of this job since moving to Rackspace are UNSTABLE 19:13:39 just conincidence tykeal 19:13:46 (we also catch some other very subtle bugs in autorelease... just pushed a fix for one this morning... although I *can't* think of a way to write a test short of autorelease to catch the problems only it catches) 19:13:47 * tykeal feels better 19:13:58 tykeal: Your infra is doing its normal awesomeness 19:14:07 ok 19:14:20 tykeal: The most we can accuse you of is having the foresite to set up the infra in a way here that facilitates catching bugs of these kinds ;) 19:14:33 hehe 19:15:01 * edwarnicke amusing faux glare ;) 19:15:12 LuisGomez: So we have to figure out what to do about it 19:15:32 I think the options are this: 19:15:39 so, I don't know if the tests that are failing would add to the build time significantly but I notice that even with the UNSTABLE these builds are _must_ faster for this job. Previously they were ~45 minutes all of these UNSTABLE builds this morning have been ~10 minutes 19:15:55 tykeal: Victory :) 19:16:22 So LuisGomez, here's what I see as the decision tree from here: 19:16:39 Decision1: Who creates the fix patch for snmp4sdn 19:16:45 Decision1.Option1: I do 19:16:54 Decision1.Option2: We email Christine and ask her to 19:17:13 thats all we can do right? 19:17:13 Decision2: What do we do about the breakage until SNMP4SDN merges a patch 19:17:30 Decision2.Option1: We let everything stay broken till SNMP4SDN fixes itself 19:17:50 Decision2.Option2: We comment out SNMP4SDN in integration, and in autorelease until the fix patch is merged 19:18:08 ok thats not bad either 19:18:09 LuisGomez: Well... you or tykeal could write the fix for Decision1 ;) 19:18:25 LuisGomez: It might even be educational ;) 19:18:36 * tykeal doesn't know what would need to be done 19:18:57 tykeal: See... educational ;) 19:19:01 * edwarnicke is showing his sgml roots 19:19:04 bad time for the education ;) 19:19:18 tykeal: LOL... an argument to which I am utterly sympathetic :) 19:19:19 there is some wiki on how to do that, i can take a look 19:19:40 LuisGomez: Is 'there is some wiki on how to do that'... was that a question or a statement? 19:19:52 statement 19:19:55 i am sure 19:19:57 Ah :) 19:19:59 Cool :) 19:20:09 Do you know where that wiki on how to do it is? 19:20:16 i can find it yes 19:20:25 Cool... ping me if you have trouble finding it 19:20:28 karaf step by step or similar 19:21:03 LuisGomez: And one Decision2, do you want Decision2.Option1 (leave things broken) or Decision2.Option2 (comment out SNMP4SDN until they merge the fix) ? 19:22:15 i was investigating the restconf issue with a local karaf installation so i do not need integration running right away 19:22:39 but if people push patches in the weekend and we want to do some test... 19:23:01 LuisGomez: OK... I was going to be preparing some patches there today to integration to clean up some small things that can cause subtle but imporatant problems 19:23:14 LuisGomez: I expect to see a bunch of folks scrambling this weekend 19:23:24 so then option 2 as well 19:23:40 OK... do you want to prepare the patch for Option2 there and I'll review it? 19:23:54 yes, that first so we can get going 19:24:01 i will do right away 19:24:03 Also... you should probably email snmp4sdn-dev and Christine letting her know 19:24:05 LuisGomez: Thank you :) 19:24:11 LuisGomez: Ping me when you need a review :) 19:24:12 ok 19:24:15 ok 19:24:28 tykeal: Does the overall root issue make sense to you? 19:24:37 edwarnicke: I believe so 19:24:46 tykeal: I ask, because you spread a whole lot of sane around generally... and so its helpful for you to understand ;) 19:24:50 tykeal: Cool :) 19:24:59 as I said, I was just worried that it was a brokenness in the migrated environment 19:25:02 LuisGomez: Just to let you know he subtle thing I'm poking at right now (actually, two things) 19:25:49 ok 19:25:50 1) We have some things we need that are not copied into system/ but as they are in the maven central repo this does not *break* us... but does produce very slow startup, and *would* break someone operating in an offline mode where they couldn't reach maven central... so I'm going to fix that 19:25:57 also, because that job is finishing UNSTABLE it means it hasn't triggered any of the jobs downstream of it so the rest of the environment interconnect hasn't been validated :-/ 19:26:50 2) We had a lot of cases this week of folks seeing bugs that had to do with stale snapshots of stuff in their local .m2 cache, and karaf grabbing those. This makes addressing bugs very had at times. So I was going to cause the integration karaf distro at least to *not* look at the local .m2 19:27:27 i believe 1) is related with the fact that current karaf distro cannot start stand-alone 19:27:27 tykeal: I am strongly in favor of us making sure the new environment is all working... because I am going to guess you won't feel at ease till you know it is (and I want you to feel at ease ;) ) 19:27:44 LuisGomez: Wait... were are you seeing it not start standalone? 19:27:48 well, I have very high confidence but... yeah 19:27:59 LuisGomez: Because it *should* start standalone as long as there is network connectivity to maven central 19:27:59 * tykeal loves puppet managed systems 19:28:14 edwarnicke, remember the guava issue? 19:28:22 Yes, but didn't we fix that? 19:28:27 i think i saw it in latest distro 19:28:33 i can retry 19:28:38 * edwarnicke is curious to understand :) 19:29:10 LuisGomez: OK... either way... I am going to absolutely make sure *everything* is in there as long as folks don't bork their features/pom.xml files (which I can't work around easily) 19:29:44 LuisGomez: For Lithium I think I can get a patch to the karaf guys for their maven plugin that can construct the system/ directory by walking the features files... which should make this all much easier, and also make the zip much smaller 19:29:51 * tykeal manually triggers integration-master-csit-karaf-compatible-min 19:30:06 * edwarnicke knows how to solve more problems than he has time to do before Helium ;( 19:30:16 also for 2) i think CASP3R told me we are clearing m2 cache every time we deploy karaf distro in integration Jenkins 19:30:19 yes 19:30:42 otherwise there are issues 19:31:05 LuisGomez: we're clearing just the org/opendaylight portion of m2 cache which the part we really need to 19:31:22 ok 19:31:27 that is then 19:31:46 ah you helped CASP3R :) 19:32:28 LuisGomez: more like he did it and I had looked over his script and gave it a thumbs up ;) 19:32:31 LuisGomez: That is helpful for integration 19:32:41 LuisGomez: It doesn't help for the case where other folks are finding issues :) 19:33:38 ok, time to fix the snmp 19:35:31 thanks edwarnicke and tykeal 19:35:40 Thank you LuisGomez ! 19:35:48 And tykeal , thank you for being here on a weekend to help out :) 19:36:28 hrmm... I'm somewhat concerned with this: https://jenkins.opendaylight.org/integration/job/integration-master-csit-karaf-compatible-min/15/console 19:36:34 that's a lot of connection refused 19:38:00 the existing karaf distro might be broken 19:38:06 :-/ 19:38:26 lets fix the integration removing the snmp and recheck 19:39:03 ok... well, I'll just let this job run to completion 19:39:26 or should I just cancel it and when you get the snmp bits removed and we'll just let everything flow? 19:39:56 oh score, something passed: 19:39:57 Karaf-All.MD SAL NSF OF13 :: Test suite for MD-SAL NSF mininet OF13 | PASS | 0 critical tests, 0 passed, 0 failed 23 tests total, 0 passed, 23 failed 19:40:02 you can stop it if you want 19:40:31 ok, that was really all the confirmation I really needed, it lets me know that things are operating appropriately enough to both pass and fail tests :) 19:41:25 oh wait, I misread that. It was actually a bunch of failed tests :-/ 19:42:47 heh, all the PASS on the tests are from RESTCONF tests and looking over the counts it's because there aren't actually any tests... test count 0, easy to pass that... 19:42:49 lets say so far i cannot blame the migration for all the issues we are having, i will let you know otherwise :) 19:42:50 LuisGomez: What memory settings for Karaf are you using? 19:43:11 LuisGomez: I am going to set something reasonable globally 19:43:18 let me see i change that after talking to you 19:43:53 FYI all of your systems are 8cpu x 8G RAM systems now 19:43:53 LuisGomez: Because its not cool that I get permgen errors when trying compatible-with-all in RC0 just because of so many bundles being loaded... need to fix that 19:43:54 export JAVA_OPTS="-Xmx2048m -XX:MaxPermSize=512m" 19:44:00 tykeal: :) :) :) 19:44:08 LuisGomez: Cool :) 19:44:14 well, all the integration systems ;) 19:44:52 so we are priviledged :) 19:45:13 something like that 19:45:27 i will not tell anybody… 19:45:30 heh 19:46:27 when we dynamically launch builders they get 8x8 systems as well, but not everyone is doing stuff like that right now. So we've got some masters on 2x2 and 4x4 depending upon what their actual observed usage was like before they migrated 19:46:56 makes sense 19:48:57 ok, so, for right now I'm going to assume that the env is good. give me a ping if you want me to check something though 19:49:13 right 19:49:25 will do 19:49:40 unsolicited ping? 19:49:54 LuisGomez: that will be fine ;) 19:50:07 * edwarnicke casts around for his ping clothes 19:52:52 Luis Gomez proposed a change to integration: Removing SNMP feature dues to integration issues. Will be back when resolved. https://git.opendaylight.org/gerrit/11145 19:53:23 ok lets see if the patch passes the tests 19:53:32 LuisGomez: :) 19:55:39 oh, i removed the feature repo instead of the feature itself, need to file second patch 19:55:50 LuisGomez: OK :) 19:57:53 Luis Gomez proposed a change to integration: Removing SNMP feature due to integration issues. Will be back when resolved. https://git.opendaylight.org/gerrit/11145 21:09:35 tykeal: Still around? 21:09:41 edwarnicke: yes 21:12:42 edwarnicke: what can I do for you? 21:12:59 tykeal: So... I am in the process of making a change that will fix two issues 21:13:19 1) It will preclude things like the snmp4sdn bug we just hit in integration (by forcing the breakage to the snmp4sdn verify job) 21:13:44 2) It will make it easier to avoid non-heisenbugs were folks are getting stale artifacts at runtime from their .m2 cache 21:13:47 But there's a cost... 21:13:57 oh? 21:13:59 It means all the local karaf distros will be big like integrations 21:14:04 (not *as* big... but still big) 21:14:21 So it seemed to me I should *probably* at least mention that to you first ;) 21:14:31 umm... yeah, thanks for the info 21:14:41 It somehow felt unfriendly to *surprise* you with a sudden shift in disk usage 21:14:52 * edwarnicke tries to practice the principle of least surprise... 21:15:11 as an FYI in the last 2 weeks the nexus usage went from ~75G of data (for snapshots, releases and all proxied artifacts) to nearly 200G... 21:15:33 * tykeal had to add more disk to the system yesterday because of it 22:36:53 edwarnicke 22:37:08 you can merge integration patch https://git.opendaylight.org/gerrit/#/c/11145/ 22:38:23 it takes 1.5 hours to build integration now 22:39:37 A change was merged to integration: Removing SNMP feature due to integration issues. Will be back when resolved. https://git.opendaylight.org/gerrit/11145 22:39:44 Merged 22:39:51 Do we know *why* its now taking 1.5 hours? 22:39:57 What was it taking before the migration to rackspace? 22:40:13 tykeal: Question... we we know what nature of disk IO we have? 22:40:23 tykeal: Was wondering if that might be slowing things down 22:40:38 just before rackspace was taking long as well but never figure out how long because jobs were timing out 22:41:44 [INFO] Reactor Summary: 22:41:44 [INFO] 22:41:46 [INFO] OpenDaylight Integration Project .................. SUCCESS [1.945s] 22:41:47 [INFO] OpenDaylight Distributions ........................ SUCCESS [0.347s] 22:41:49 [INFO] OpenDaylight Base Edition ......................... SUCCESS [1:43.872s] 22:41:50 [INFO] Opendaylight Virtualization Edition ............... SUCCESS [23.291s] 22:41:52 [INFO] OpenDaylight Service Provider Edition ............. SUCCESS [34.176s] 22:41:53 [INFO] OpenDaylight Toaster Edition ...................... SUCCESS [12.878s] 22:41:54 [INFO] features-integration .............................. SUCCESS [1:27:11.681s] 22:41:55 [INFO] distribution-karaf ................................ SUCCESS [1:02.077s] 22:42:10 feature test takes all the time i guess 23:08:46 edwarnicke: it was taking ~45 minutes before rackspace. It was taking ~10 minutes in rackspace before removing the snmp bits 23:09:01 as for the I/O the disks at rackspace are faster than what we have in LF 23:09:37 I don't have hard numbers for comparison, but I do know that we're getting better I/O out of them. In most cases we're on SSD 23:09:58 whereas at LF we were on SAS or SATA in worst case 23:14:55 FYI I'm seeing the following error in the karaf.log on the controller system: 23:14:56 2014-09-13 23:04:37,605 | WARN | Event Dispatcher | AetherBasedResolver | 5 - org.ops4j.pax.url.mvn - 1.6.0 | Error resolving artifactorg.opendaylight.integration:features-integration:xml:features: 23:14:56 0.2.0-SNAPSHOT:Could not find artifact org.opendaylight.integration:features-integration:xml:features:0.2.0-SNAPSHOT in nexus-release-mirror (http://nexus01.dfw.opendaylight.org:8081/nexus/content/groups/public/ 23:14:56 ) 23:14:56 org.sonatype.aether.resolution.ArtifactResolutionException: Could not find artifact org.opendaylight.integration:features-integration:xml:features:0.2.0-SNAPSHOT in nexus-release-mirror (http://nexus01.dfw.opend 23:14:57 aylight.org:8081/nexus/content/groups/public/) 23:15:10 looks like it's trying to find something where it doesn't belong as well... 23:18:31 yes 23:18:36 i see this error too 23:18:50 thats why tests do not pass now 23:19:08 I'm going to hazard a guess it's why the controller doesn't actually start listening on the ports its supposed 23:19:16 sure 23:19:28 the controller does not start with this exception 23:20:04 the karaf log is way longer than what we are getting now 23:20:34 btw tykeal 23:20:54 it is possible that with karaf we only need 1 controller deploy job 23:21:32 only difference between jobs now is features to deploy + sleep time 23:21:33 * tykeal isn't the one that replicated the job everywhere ;) 23:22:11 before with old distro was more difficult but now we have a single distro 23:23:30 i will check how feasible it is when i get some time 23:53:46 tykeal, this is weird, i have no issues deploying the karaf edition is failing in my laptop 23:53:59 i do not get the above error 23:54:37 it's not failing? 23:54:44 i will retry cleaning everything 23:54:59 .m2 and .karaf 23:55:09 just in case i missed that 23:57:17 LuisGomez: can you try with the following ~/.m2/settings.xml : http://pastebin.com/uZPV4p3r 23:57:38 that will mirror what we're doing in Rackspace, just using the master nexus since you can't reach the nexus proxy we're using 23:58:21 correction to it: http://pastebin.com/gkGePUgZ (I missed a couple of characters in my original paste) 23:58:22 ok 23:58:27 yes 00:06:03 correct it fails with these settings 00:06:31 tykeal, are these settings correct? 00:07:26 LuisGomez: a modified version that uses the private nexus repo is what every project in Rackspace has been using for over a month now 00:07:42 ok 00:08:40 what this file does is forces a hard repository separation between snapshot artifacts and release artifacts. If something is somehow misconfigured / misidentified as a release it causes maven to look in the release repo... which is what we're seeing 00:09:24 I could remove the configuration from the integration lab if you want, but things misidentifing themselves is a bug 00:18:41 no leave it like that then until edwanicke takes a look 00:25:43 Reading log 00:26:41 tykeal: That's not really an error from nexus repo issues 00:27:01 tykeal: Its the final error that occurs because that artifact was not installed in the local .m2 00:27:14 (or in our case ${WORKSPACE}/.m2repo ) 00:27:36 tykeal: Which is to say, its not *really* looking for it there 00:28:28 tykeal: LuisGomez Is this on *launching* the controller, or in line for *building* it? 00:28:44 edwarnicke: it's on the system that is launching the controller 00:29:03 tykeal: Cool... does anyone have a link the controller we are trying to run there? 00:29:06 the deploy job grabs the tarball from the build job, extracts it and then runs it 00:29:15 Got a link to that tarball or zip? 00:29:27 i have link, hold 00:29:53 I'll download, look at it, and live blog^H^H^H^H chat what I look at and the process I use to look into it 00:30:12 wget https://jenkins.opendaylight.org/integration/view/Verify%20Jobs/job/integration-master-verify-distributions/lastSuccessfulBuild/artifact/distributions/extra/karaf/target/distribution-karaf-0.2.0-SNAPSHOT.zip 00:30:14 edwarnicke: the deploy process is scripted here: 00:30:31 and that ^ would be the current deploy package ;) 00:30:45 here's the job in question: https://jenkins.opendaylight.org/integration/view/Deploy%20Jobs/job/integration-master-deploy-controller-latest-karaf-compatible-all/configure 00:30:54 all the deploy jobs basically do this shell 00:31:01 Downloading 00:31:32 Glanced at the script 00:31:47 But this is going to be more about the zipfile 00:31:52 Because if your script were borked 00:32:01 It wouldn't get as far as that error :) 00:32:01 also edwarnicke if it isn't actually trying to do the download during the startup, why does it fail with for LuisGomez using our modified settings.xml but not without it? 00:32:07 tykeal: OK 00:32:16 So for the controller run, this is a bit different 00:32:34 tykeal: There is a prepackaged mvn repo in system/ 00:32:41 tykeal: It should have everything we need 00:33:15 ok, then the settings.xml shouldn't matter... that is of course, if it isn't trying to download something 00:33:16 tykeal: karaf only looks in your local .m2, in well known repos like central, and if you have something in settings.xml, it will look there 00:33:26 tykeal: apparently it is not respecting snapshots there ;) 00:33:31 ah 00:33:37 tykeal: So if things are *correct* 00:33:46 tykeal: It should have everything it needs in system/ 00:33:51 So let me look at that 00:33:54 ok 00:33:56 thanks :) 00:34:09 tykeal: So you are not crazy.. its just weird 00:34:25 tykeal: And this is, once again, revealing (probably) something really broken, not an infra issue 00:34:36 yep 00:34:41 ahh... you took that away from me? I was hoping that the tinfoil hat would look good ;) 00:35:30 I dare say, this move to rackspace for integration sure has uncovered some weird issues 00:35:33 * edwarnicke ponders where to find a really fashionable tinfoil hat 00:35:43 * edwarnicke thinks he knows the right artist 00:36:15 http://media-cache-ak0.pinimg.com/736x/e3/ba/86/e3ba8639b16e292922bf6df43c9de28c.jpg 00:36:20 nice tinfoil fedora ;) 00:36:27 all that is happenning today in integration is probably nothing to do with the move, i am sorry for tykeal :) 00:36:33 tykeal: Exactly :) 00:36:41 LuisGomez: We'll see :) 00:36:51 OK 00:36:53 I downloaded 00:36:54 unziped 00:37:03 did a quick find through system/ for features-integration 00:37:09 saw something that looked roughly right 00:37:16 rm -rf ~/.m2 00:37:18 ok 00:37:19 now running 00:37:21 cd bin/ 00:37:22 ./karaf 00:37:30 Runs locally 00:37:32 and… 00:37:39 Now checking for things in ~/.m2/ 00:37:40 hold a bit 00:37:51 see if you get the int feature installed 00:38:03 karaf starts but feature is not found 00:38:13 *oh* 00:38:13 with tykeal settings 00:38:16 Which feature should I install? 00:38:36 odl-integration-compatible-with-all 00:39:03 Installing 00:40:00 it also works locally for me but not when i use tykeal .m2/settings.xml 00:40:17 tykeal: Can you give me a laundered .m2/settings.xml to use? 00:40:25 http://pastebin.com/gkGePUgZ 00:40:39 we use a modified version of that for all systems in rackspace 00:40:53 modifed to point to the internal proxy that is ;) 00:41:03 and also have the deployment user info... 00:41:30 tykeal: OK... could you also pastebin me an example of the settings.xml you used to use in LF for integration? 00:41:37 tykeal: Just so I can eyeball compare 00:43:03 Starting again with empty ~/.m2/repository, ~/.m2/settings.xml, and a fresh unpack of the zip file 00:43:11 edwarnicke: here http://pastebin.com/XfpvaQYJ 00:43:37 as I said, the only real difference is that we have a) user push info and b) we point to the local nexus proxy 00:44:32 Is this the one for the Rackspace: http://pastebin.com/gkGePUgZ ? 00:44:43 tykeal, repo issues as well in central job: 00:44:45 https://jenkins.opendaylight.org/integration/view/Integration%20jobs/job/integration-master-project-centralized-integration/2388/console 00:44:47 edwarnicke: the second one that I pasted is 00:45:14 Ah.. and the first one is from LF ? 00:45:29 for general consumption: http://pastebin.com/gkGePUgZ 00:45:29 redacted one for rackspace: http://pastebin.com/XfpvaQYJ 00:45:53 edwarnicke: we didn't do that when running inside LF, I started doing this for systems inside rackspace to force them to use the local proxy 00:45:55 LuisGomez: That error looks like network connectivity in infra 00:46:08 tykeal: ACK 00:46:40 doing this in rackspace saves ~20 minutes of artifact recovery times in a lot of cases 00:46:47 tykeal: LuisGomez I have reproduced the error with the LF settings.xml file 00:47:00 ok 00:47:28 edwarnicke: if you think that the settings.xml file is improperly formed I would like to know so I can fix it. This is in use on every build system in rackspace 00:47:44 tykeal: I have no conclusions yet 00:47:47 and has been since we started there over a month ago 00:48:08 correction 2.5 months ago ;) 00:48:34 tykeal: My current thinking is that its unlikely to be that the file is malformed... rather that something weird is happening, need to figure it out 00:48:52 ok 00:50:27 tykeal: Question... where is the settings.xml file on the controller running server at LF, and where is it at rackspace (Filesystem location) 00:50:51 edwarnicke is right, http://nexus01.dfw.opendaylight.org:8081/ is not rechable 00:51:01 ~/.m2/settings.xml in all cases 00:51:08 tykeal: Good to know 00:51:33 the file is owned by root but readable by jenkins. that way bright light doesn't try futzing with it via a jenkins job 00:51:39 * tykeal actually had a dev do that 00:51:57 before I set the root perms that is ;) 00:52:03 tykeal: *sigh* 00:52:05 silly dev 00:52:14 it was GiovanniMeo ;) 00:52:27 when the odlautorelease was first getting setup 00:52:45 tykeal: If I want you to change settings.xml, I will ask you (so you can tell me why my ideas is questionable ;) ) 00:52:52 hehe 00:52:57 tykeal: I noticed he just brought his own at the end of the day 00:53:37 hrmm... he had asked us to make changes to the one on odlautorelease, I would expect that it would all he was needing... 00:53:42 * edwarnicke notes his autorelease does not require a space station 00:55:00 tykeal: OK... here's what I've tried... 00:55:12 I used the LF settings.xml to begin with because I was confused: 00:55:20 https://www.irccloud.com/pastebin/om0r3fwc 00:55:27 I pastebin it above so we are all on page 00:55:43 *that* settings.xml produces the failure 00:55:57 right, that's the one I handed you 00:58:14 both central and karaf deploy errors seem related as both complain they cannot rech http://nexus01.dfw.opendaylight.org:8081/ 00:58:50 tykeal: So thats from LF, the *pre* migration settings.xml, correct? The one that was *working* before? 00:59:00 well, nobody not inside the rackspace network can reach nexus01.dfw.opendayilght.org 00:59:25 edwarnicke: we _didn't_ use something like that pre-migration all the settings.xml had in it was user info 00:59:42 tykeal: *oh* 00:59:54 we've _been_ using this: http://pastebin.com/XfpvaQYJ for all systems in rackspace (~2.5 months) 00:59:58 So premigration we didn't have any pointer to repos? 01:00:21 edwarnicke: that's correct, because when the envs were all getting setup I didn't know enough about maven to do this sort of thing 01:00:32 tykeal: Could you pastebin me what was used in LF yesterday that was working (with appropriate REDACTIONS of credentials) ? 01:00:55 edwarnicke: remove the mirrors section of that and you'll have it exactly 01:01:11 tykeal: Ah... OK 01:01:56 tykeal: So... working theory (not tested) is that by having a settings.xml, karaf is trying there instead of the local system/ 01:02:04 tykeal: Let me poke at that for a moment 01:02:08 ok 01:02:24 as I've said, I _can_ remove the mirrors section from the integration lab if we need to 01:02:38 it's not needed, I'm just trying to save bandwidth and download time 01:04:21 tykeal: In a pinch, could you just remove it from the ones running the controller? 01:04:43 yes 01:05:22 yes, controller vm jobs 01:06:14 edwarnicke, karaf pax-exam is also impacted by this? 01:06:31 LuisGomez: No 01:06:31 or only when deploying karaf distro 01:06:33 ok 01:06:37 good to know 01:07:14 tykeal: OK... so it looks like with a settings.xml, its deciding to pick that instead of the configured repos, one of which is the system/ directory 01:07:23 Now I just need to figure out why and how to fix it :0 01:07:46 ok, just let me know if I should pull it from the vms running the controller for the tests 01:08:19 i would say do it if it is not too much work tykeal 01:08:34 because now we are stopped at integration 01:08:54 we cannot deploy controller :( 01:10:10 unless edwarnicke thinks he can fix this very soon… 01:10:25 LuisGomez: I would concur 01:11:38 done 01:11:59 ok, lets try 01:15:09 yep, it is working now :) 01:15:28 :-/ silly karaf 01:15:49 tykeal: I actually have a bit better understanding of things as well 01:15:49 ok, that one _is_ an infra issue, but only because we don't know why karaf is being silly ;) 01:15:55 tykeal: So... you have a mirror config 01:16:19 And your mirror config says to try anything that is not a mirror of opendaylight.snapshots on that release mirror (overall, this is a *very* good thing) 01:16:35 karaf looks in common repos like central for things 01:16:37 it's the recommendation in the nexus administration guides 01:16:50 So your mirror config tells it to look for things in your release mirror it would otherwise look for in central 01:16:59 tykeal: Do not mistake me, what you are doing there is pure awesome 01:17:08 tykeal: I can't begin to tell you how much I approve 01:17:17 tykeal: Like wish we were doing that *everywhere* LF or not 01:17:28 (within ODL infra that is) 01:17:32 hehe 01:17:42 tykeal: So this is in no way saying you should have done something different 01:17:56 tykeal: Just explaining the mystery of WTF is it looking in the release repo for a snapshot 01:18:00 well, I can add in the mirror clause for controller and ovsdb to point to the master nexus (as it's the one close to them). They're only holdouts in the LF infra 01:18:13 tykeal: Yeah... that's not the root issue 01:18:14 edwarnicke: your controller dynamic VMs, since they are in rackspace are using the mirror 01:18:22 The root issue is that for reasons I'm still investigating 01:18:35 having a settings.xml is causing it to ignore the rest of the karaf config about such things 01:18:43 Which is a problem I have to solve anyway 01:18:52 ok 01:18:55 Which is to say, you found a bug that I'm happy we found instead of a customer 01:18:59 Good Job! :) 01:19:18 it would probably be something to get fixed since you never know what freaky admin decides to stick a settings.xml in their system... ;) 01:19:34 freaky admins ;) 01:19:48 I agree.. its one I'm happy we found first 01:20:07 ok, can you raise the appropriate bug about it then? I'm not exactly certain what should go in it ;) 01:20:26 tykeal: Yes, but I need to investigate it a bit more 01:20:30 ok 01:20:32 tykeal: It feels like a karaf bug 01:20:36 ah 01:20:39 tykeal: Which we may need to workaround 01:20:49 tykeal: the good news is, I know those guys 01:20:53 And can ask them :) 01:20:55 excellent 01:21:12 (they have fixed bugs for me before, and more often than that, they have pointed out the switch I flipped wrong ;) ) 01:21:22 in the mean time, I think I need to go track down some dinner. I'm a bit famished 01:21:24 tykeal: But the good news is: 01:21:31 a) I have a reasonable hypothesis 01:21:36 b) We have a reasonable workaround 01:21:52 tykeal: Were you going to remove the settings.xml from the controller VMs? 01:22:27 I commented out the mirrors section and already made that change. I left the user info in the file as we've had that since time began (so to speak) 01:22:54 tykeal: Cool 01:23:12 LuisGomez: Can you give it a whirl while tykeal insures we still have a corporeal freakishly awesome sysadmin? 01:23:48 yes, deploy works, i just need to pass the test now 01:28:40 LuisGomez: tykeal after reading a bit more, and thinking I now understand 01:28:48 Let me explain 01:28:50 ok 01:29:10 in etc/org.ops4j.pax.url.mvn.cfg 01:29:18 karaf defines a list of repos: 01:29:26 One of which is: 01:29:34 file:${karaf.home}/${karaf.default.repository}@id=system.repository 01:29:37 which is the system repo 01:29:43 system/ 01:29:49 ok 01:29:50 The one we pre-populate 01:30:03 and should contain all 01:30:06 Yep 01:30:17 But these are being treated as simple *repos*, like any other maven repo 01:30:27 In comes tykeal 's settings.xml 01:30:41 And it say 'Whatever repo you have, if its not opendaylight.snapshots, use this mirror instead' 01:30:47 Since system/ is just another repo 01:30:57 *sigh* 01:30:59 karaf obediently asks the mirror instead 01:31:12 tykeal: LuisGomez Does that make sense? 01:31:18 sure 01:31:36 oh look, the controller actually bound to the ports it was supposed to 01:31:40 Now I can hack this 01:32:00 Because I *can* put a settings.xml into the karaf distro, and tell the config file to use that as the settings.xml 01:32:10 edwarnicke, is there a way to tell karaf ONLY use /system 01:32:12 I'm not positive that's the right thing to do yet, need to think about it 01:32:28 LuisGomez: Yes 01:32:33 LuisGomez: I am pretty sure I can do that 01:32:38 cool 01:32:41 LuisGomez: I am just not sure I *should* do that 01:32:56 now will break everything probably... 01:33:02 but we need that for relase 01:33:06 :-/ the job still seems to be failing tests though 01:33:16 let me check 01:33:19 LuisGomez: I don't think we do 01:33:23 LuisGomez: Let me tell you why 01:33:35 still saying no route to host in the tests which seems strange to me 01:33:37 So karaf looks at system/ but also common public repos like central 01:33:56 you're a) using IPs b) I've verified those IPs are accessible by all the test lab systems 01:34:00 LuisGomez: Users have a very reasonable expectation, that if they do a bundle:install for a bundle that lives in central, it will work 01:34:08 If we *force* only using system/ it won't 01:34:42 LuisGomez: So I'm going to do this 01:34:55 LuisGomez: Let me check the settings.xml override 01:35:00 LuisGomez: To make sure it *works* 01:35:07 And then we can decide whether we *should* do that 01:35:29 At our leisure 01:36:08 tykeal: controller at 192.168.4.5 is not reachable for any reason 01:36:56 edwarnicke:ok 01:37:03 LuisGomez: hrmm... 01:37:25 tykeal: is not reachable from both robot and mininet vms 01:37:38 looking into it now, it should be 01:38:02 hah, I see the issue, one moment 01:40:00 LuisGomez: I can force use of a settings.xml inside our zip file if I need to 01:40:04 Just verified 01:40:17 Again, not sure if I should, would have to think about it 01:40:54 ok, lets think about before doing it 01:40:57 no rush 01:41:14 Yeah... as long as we know it *does* work (as opposed to *thinking* it *may* work ;) ) 01:41:26 And it makes sense now :0 01:41:28 :) 01:43:21 LuisGomez: I did discover investigating earlier today that there are about 50 jars that we probably want in system/ that are not there (stuff from outside of ODL). I am working on making sure they are there, which should speed up startup and feature install a lot 01:43:49 sure 01:44:33 the important thing is that we should not require internet to install a karaf distro with odl features 01:45:01 totally stand-alone distro for the features we define 01:45:24 tykeal: let me know when i should retry my test 01:46:28 LuisGomez: *yes* 01:46:37 hrmm... strange, I could have sworn I had fully tested the inter connect on the private network 01:46:41 LuisGomez: Which is what I am working on fixing :0 01:47:05 even so, it's still failing on me with some very wide open firewall rules 01:47:28 * tykeal considers just routing the traffic over the front side network as that _does_ work 01:48:24 then again, that doesn't seem to be wanting to work right now either 01:48:29 tykeal: all the vms have IP in same subnet so no need to route right? 01:48:55 LuisGomez: correct, no routing should actually be involved, just allowing traffic to pass the firewall which it is setup for already 01:49:08 are all interfaces up and connected to same bridge? 01:49:20 can you do arp -a in the vms? 01:49:44 LuisGomez: they're connected to the same network. I can't say they're on the same bridge. I don't know what the rackspace network looks like ;) 01:49:59 :) 01:50:20 I think I may actually need to give them a reboot. Something seems to be a bit off. I just had some weird connectivity issue to the VMs 01:50:31 ok, go ahead 01:53:26 tykeal: in rackspace you do not manage the vswitches for the VMs? how do you know then a VM interface is well connected to the network? 01:54:17 maybe you have a guy like in openstack but that is not the best to troubleshoot an issue :) 01:54:21 LuisGomez: it's all OpenStack. we define a subnet and say bring a VM up connected to these various networks and it allocates an IP address on the subnet and connects a NIC to the subnet 01:54:54 :) 01:55:06 and as of last night I _could_ ssh between the test lab systems accross the front side nics. But right now, even after the reboots I'm getting no route to hose 01:55:09 err.. host 01:55:09 so openstack is ready for production :) 01:55:57 well at least they support you in rackspace with this kind of issues right? 01:56:21 without access to the host, very difficult to troubleshoot i guess 01:56:28 yeah, I've also got a couple of the VMs I just rebooted not even coming up yet 01:56:38 which is weird 01:58:22 I also find it rather interesting that Jenkins seems to think that they systems may be up... 01:59:01 even though I can't get to them 02:00:57 yep 02:01:11 ok, i have to leave for a while now 02:01:24 yeah, something is seriously messed up here and I'm going to have to contact Rackspace about it 02:01:26 i let you do :) 02:02:15 also do not know if related but the issue with http://nexus01.dfw.opendaylight.org:8081/ 02:02:29 central job is blocked by this too 02:02:39 ? 02:03:14 https://jenkins.opendaylight.org/integration/view/Integration%20jobs/job/integration-master-project-centralized-integration/ 02:03:50 this job is failing because it cannot reach above repo 02:04:06 this is in the master jenkins vm 02:04:17 yeah, it's related 02:04:35 good 02:04:45 at least it is the same thing :) 02:05:38 need to go now, i will check later today or tomorrow morning 02:06:09 ok, I'm thinking that Rackspace is having a major network problem. a lot of our VMs are starting to have network issues all of a sudden 02:06:44 can be yes 02:07:39 I think I'm going to go find some dinner and come back to it. Hopefully it will have shaken itself out because if many of our systems are having an issue then it will likely be something they are already working on 02:07:49 tykeal: EAT! :) 02:08:06 * tykeal waves be back in a while 02:08:16 tykeal: Have fun storming the castle! 02:08:26 yes, have fun 02:08:59 oh, hey, looks like things may have just corrected themselves 02:09:06 Horray! 02:09:49 and yes, I can communicate between systems on both front and back side networks, so things _should_ be working now 02:10:15 I'll come check again after dinner 02:11:58 ok, triggering jobs now 02:25:47 Good news :) 02:25:57 Autorelease with sdni and plugin2oc is working :) 12:46:01 David Goldberg proposed a change to integration: added sfclisp feature to integration https://git.opendaylight.org/gerrit/11153 12:46:59 David Goldberg proposed a change to integration: added sfclisp feature to integration https://git.opendaylight.org/gerrit/11154 19:00:49 David Goldberg proposed a change to integration: added sfclisp feature to integration https://git.opendaylight.org/gerrit/11166 20:46:58 A change was merged to integration: added sfclisp feature to integration https://git.opendaylight.org/gerrit/11166 20:51:06 David Goldberg proposed a change to integration: added sfcofl2 to integration https://git.opendaylight.org/gerrit/11170 05:53:34 #endmeeting