07:04:48 #startmeeting multisite 07:04:48 Meeting started Thu Aug 25 07:04:48 2016 UTC. The chair is joehuang. Information about MeetBot at http://wiki.debian.org/MeetBot. 07:04:48 Useful Commands: #action #agreed #help #info #idea #link #topic. 07:04:48 The meeting name has been set to 'multisite' 07:05:08 #topic rollcall 07:05:11 #info joehuang 07:05:31 #info Ashish 07:05:47 #info meimei 07:06:11 #topic Unstable CI job running 07:06:37 hello, the CI job for functest is not stable 07:06:49 #info dimitri 07:07:10 do you have any propoposal, functest thought it may be the issue of SUT 07:07:37 Meimei said similar issue in compass, could you share the experience 07:07:52 ext-network | restart api server 07:08:11 what's ext-network? 07:08:53 joehuang: yes, you can see we reset all the api services 07:08:54 https://build.opnfv.org/ci/job/compass-deploy-baremetal-daily-master/466/console 07:09:17 May dimitri help to check whether all services in controller(two site) work normally 07:09:18 because the unstable api 07:09:40 yes, It works properly. He replied to that mail 07:09:43 you mean API server reboot now and then or don't work 07:10:38 functest assumes that SUT is a freshly installed system that will be removed after deployment 07:10:41 I saw the mail, but if you try openstack endpoint list many times, sometimes it don't work 07:10:46 this is not the case for multisitie 07:11:00 as the result functest is not carefull with resource allocation/deallocation 07:11:45 I’m cleaning up the opnfv images now 07:11:57 if you try several times for openstack endpoint list, the error "No service with a type, name or ID of '054f26bc26a949e1aeaccf0a2b932903' exists" will occur 07:12:41 it seems some un-recognized service_type for some registeredendpoint 07:13:12 how to logon to the controller node? 07:13:26 can you login to Jumphost? 07:13:33 I’ve tried it multiple times and it works 07:13:37 yes, I can now 07:13:39 there’s no such UUID in the list 07:13:51 Every 2.0s: openstack endpoint list Thu Aug 25 08:13:23 2016 07:14:01 strange, this occured this morning 07:14:08 No service with a type, name or ID of 'cc89788680214876a070c1fce9703650' exists. 07:14:21 this has occured now, 07:14:25 I had kept watch 07:14:29 watch openstack endpoint list 07:14:39 and got this response for one of the run 07:15:54 may be this is due to multiple registration of kb service 07:15:58 you mean you also met this error 07:16:02 yes 07:16:08 I also met it just now 07:16:29 how many times for running openstack endpoint list? 07:16:29 anyhow only if service is not there, we try to register it 07:16:52 I had kept watch on it, may be after 5 6 times 07:17:23 seems regaullay every 5.6 times 07:18:27 if the request goes to one of the API server which are behind haproxy, then issue occured 07:19:45 May be there are some dead record in Haproxy will lead to forward the request to bad or non-exist api server 07:22:08 SAshish: you can try restart the api service, I am not sure it will work on fuel 07:22:50 and service unavailable with nova list also 07:22:52 okay 07:23:33 keystone? 07:23:44 you mean sometimes nova list also failed? 07:23:53 yes 07:25:04 you can check the record configured in the HAPROXY, and make sure these APIs need to be load balanced are living 07:26:14 one better way is to use nova --debug list, so that we know which endpoint failed to repond 07:26:40 yeah, kept the same to check 07:26:51 or openstack --debug enpoint list to check which endpoint failed 07:26:59 great 07:27:22 can you tell me how to access the controller node, I can't ssh into the controller node 07:27:51 ssh root@10.20.0.2 07:27:59 from jumphost login to fuel 07:28:01 ssh root@10.20.0.2 07:28:10 password => r00tme 07:28:29 once you are into fuel node 07:28:32 login to controller 07:28:34 ssh 10.20.0.3 07:29:05 it doesnt need password, you will land in first controller 07:29:32 hi, find out 1 07:29:36 "Openstack Cloudformation Service", "name": "heat-cfn"}, {"id": "cc89788680214876a070c1fce9703650", "enabled": true, "type": "object-store", "description": "Openstack Object-Store Service", "name": "swift"}, {"id": "e07966459621474ab19231cc369e685a", "enabled": true, "type": "image", "description": "OpenStack Image Service", "name": "glance"}]} 07:29:45 No service with a type, name or ID of 'e07966459621474ab19231cc369e685a' exists. 07:29:56 it's swift 07:30:23 http://192.168.0.2:35357/v2.0/ 07:30:39 the server give feedback from http://192.168.0.2:35357/v2.0/ 07:31:14 http://192.168.0.2:35357/v2.0/OS-KSADM/services 07:31:52 sorry not swift, but glance {"id": 07:31:52 "e07966459621474ab19231cc369e685a", "enabled": true, "type": "image", "description": "OpenStack Image Service", 07:31:55 "name": "glance"}]} 07:34:07 May we re-install the environment, I think it runs several week, may be not as clean as we had 07:34:54 this will also mean that the whole multi-region setup has to be reconfigured 07:35:37 your suggestion 07:36:12 use retry in the long run 07:38:09 has compass done some workaround? 07:38:11 retry should be a mechanism for all commands to SUT in functest 07:38:32 but only this one is causing delay 07:38:54 functest also checke separately nova, neutron, cinder 07:38:58 the commands pass 07:40:06 sometimes in glance image-list 07:40:22 I have noticed with nova list also once 07:40:29 yes 07:41:01 there are so many commands to SUT(system under test) more than that in check_os.sh 07:41:26 or we can skip the check_os and health check job 07:41:59 I’ve restarted cinder and nova api 07:42:00 to Meimei, can we disable check_os and health check? 07:44:18 to Dimitri, restard cinder/nova in both controller nodes or one? 07:44:49 restarting rabbit 07:44:50 yes 07:44:57 on both 07:45:22 ok 07:46:55 feels much faster now that i restarted rabbit 07:48:32 this issue is still there: openstack endpoint list 07:48:33 No service with a type, name or ID of '87d59a31161b40aeb966a02e03beaf6d' exists. 07:52:36 Need more time to find out why, let's work together to fix it 07:53:25 okay, So Joe, how did the release go? 07:53:27 restarted keystone 07:54:00 I checked the openstack bugs. apparently keystone responds slow on the ‘openstack’ commands 07:55:15 release will be on Sept 22 for Colorado 1.0 07:55:33 we need to have a stable job running 07:55:49 it also helps our new feature development 07:56:54 even bug fix also need to make sure all test cases can pass in daily job 07:57:26 ok, time is up, let's work offline to fix it 07:57:56 some bugs are there which are targetted for next release 07:58:13 which should not have any effect on current release 07:58:39 we have a new set of jenkins jobs 07:58:49 which why is in development focus? 08:00:14 what's the new set of jenkins job, you mean in OpenStack? 08:00:18 or OPNFV 08:00:37 opnfv 08:01:04 don't understand 08:01:47 we have now this 08:01:50 after keystone restart 08:01:56 multisite-kingbird-daily-colorado 08:02:08 multisite-kingbird-daily-master 08:02:28 o, both failed 08:02:45 they run in parallel 08:02:52 and I guess, they’ve blocked each other 08:03:05 since they’re running for over 6hrs already 08:03:30 disable the colorado job, we only need to maintain the master daily job 08:03:55 then we need to have two deploy scripts? 08:04:05 parameterized deply script 08:04:05 no, only master one 08:04:26 then what about colorado job 08:05:11 colodo job is luanched from releng colorado branch, but we don't need that to run 08:05:21 the master one can verify the codes 08:06:50 i cannot modify the jobs 08:07:00 ok, i have to go now 08:07:05 talk offline 08:07:05 may need help from Meimei 08:07:09 ok 08:07:14 #endmeeting