08:00:51 #startmeeting Multisite Weekly Meeting 2015.07.09 08:00:51 Meeting started Thu Jul 9 08:00:51 2015 UTC. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:00:51 Useful Commands: #action #agreed #help #info #idea #link #topic. 08:00:51 The meeting name has been set to 'multisite_weekly_meeting_2015_07_09' 08:01:02 #topic Roll Call 08:01:08 hi guys 08:01:16 Hi all, IRC only meeting right..? 08:01:19 card punching time :P 08:01:21 yep 08:01:48 thanks Zhipeng 08:01:58 hi 08:02:00 hey 08:02:03 no problem Malla 08:02:07 #info zhipeng 08:02:13 #info joehuang 08:02:18 #info dimitri 08:02:22 #info Hans Feldt 08:03:05 #info Malla 08:03:08 :P 08:03:16 #topic multisite identity service management 08:03:35 #topic multisite identity service management 08:03:42 sorry joe 08:03:43 #link https://etherpad.opnfv.org/p/multisite_identity_management 08:03:53 same idea :) 08:03:57 I just updated it 08:04:04 nice hafe 08:04:22 Thanks, take few minutes to read your update 08:05:11 I am making some statements there, would be good to make sure that they are not wrong and we're aligned 08:05:56 "The token contains a list of endpoints" only PKI token contains the endpoint-list 08:06:10 is that so? 08:06:16 UUID / Fernet Token don't 08:06:19 ok 08:06:50 no because they have a fixed size? 08:07:04 UUID is random 32 bytes data, fixed size 08:07:35 so depending on token type, the next step could be to request the service catalog 08:08:46 No service catalog will be requested for UUID/Fernet, but send the UUID/Fernet to KeyStone for validation 08:09:56 with token type UUID/Fernet, the user request the service catalog 08:11:06 User will request, so that to select a region or not 08:11:39 OK, hafe, I got what you mean 08:12:14 I found a new issue for PKI token 08:12:51 If the certs used in the token validation is revoked, then you have to replicate new certs to all sites 08:14:24 yeah PKI/Fernet brings in boring security issues 08:14:50 to hafe: could you explain "PKI are needed with a centralized Keystone to avoid inter region traffic." 08:14:50 should add that 08:15:24 offline validation for PKI tokens 08:15:36 but revocation traffic instead 08:17:04 KeyStone service can be distributed in few sites for DB replication to achieve high availability for PKI token 08:18:08 then the revokcation even in the DB will also be replicated to the these sites, then the revoke-list retrivement can be done not only in one site 08:18:08 you mean like asymmetrical replication to a few sites, not all? 08:18:37 It looks like centralized 08:19:20 cert revocation is not so frequent, isn't that? 08:19:28 I don't know 08:19:30 can be sync and symmetrial 08:19:43 and not even sure it works for PKI from some video I saw 08:20:27 yes, xiaolon, certs revocate is not so frequent. and some saids it often don't work 08:22:48 have you taken a look at the CERN use case (keystone federation)? https://blueprints.launchpad.net/keystone/+spec/keystone-to-keystone-federation 08:22:50 sorry I mean token revocation lists 08:23:00 tokens can be revoked in the API 08:23:11 how about the maturity of their proposal? 08:24:13 my understanding is that federation is only for authentication 08:24:46 almost all new features in Kilo( last version ) of KeyStone is about keystone federation : ) 08:25:48 federation is mainly for two cloud provider to borrow/rent resources 08:26:11 and we are not in that business 08:26:12 but keystone federation is about authentication right? 08:26:24 so a lot of role/user/domain/group mapping and configuration has to be done 08:27:04 what token type does CERN use? 08:27:19 joehuang: yes that is my understanding 08:27:21 federated authentication: if you auth-ed in one cloud, you can access another partner cloud 08:28:10 yes, there are some difference between the two use cases, but maybe the technical solutions may be inspired 08:28:14 do you find anything wrong or something you don't agree with? 08:28:29 such as the format of token 08:29:33 I don't know what type of token CERN is using 08:30:25 I'll try to find out what type token CERN is using 08:30:32 maybe the use token 08:30:38 I found a blog 08:30:58 kerberos 08:31:39 http://openstack-in-production.blogspot.se/2014/10/kerberos-and-single-sign-on-with.html 08:31:39 To hafe, "Multi master synchronous: Galera (others?), not very scalable" it's scalable 08:32:01 to som extent 08:32:34 joehuang: how would you phrase it? 08:33:04 galera uses virtual synchrony as a protocol 08:33:09 #link http://indico.cern.ch/event/283833/contribution/0/attachments/523615/722236/summitHK2013_All_v2.pdf 08:33:19 CERN is using PKI token 08:33:48 from what I know such protocol requires pretty good deterministic inter node links 08:34:24 and configured timeouts etc 08:34:24 The guy from Galera confirmed that they see pratice to use Galera for 15 nodes, distributed in 5 data centers 08:34:40 another point, personally, I am not fan of database replication across multi-site 08:35:15 Then, the better choice would be PKI 08:36:28 on the big picture it is not clear where the arch border line of opnfv is 08:36:57 yes, agree 08:37:11 OPNFV need a big picture 08:37:15 should opnfv deliver the multisite requirement out of the box? 08:37:42 multisite identity requirement 08:37:43 that would be the ideal goal i think 08:37:49 But we have to confront such kind of multisite issue if we put OpenStack into production 08:39:22 the high level goal I suggested: "a user should, using a single authentication point be able to manage virtual resources spread over multiple OpenStack regions" 08:39:31 do we agree on that? 08:40:01 I agree with this expression 08:40:13 agree 08:40:18 agree 08:40:29 #agreed "a user should, using a single authentication point be able to manage virtual resources spread over multiple OpenStack regions" 08:40:29 changed in the etherpad 08:40:52 so should we settle on solution 2, for this use case then ? 08:41:48 you mean the async repl idea? 08:41:56 It's up to the number of regions for whhich solution is better 08:42:17 and that becomes a pain for opnfv to magically support 08:42:37 For solution, only if there is lots of sites( exceed the cluster capability) 08:42:52 For solution 2, only if there is lots of sites( exceed the cluster capability) 08:43:33 For PKI token, there is some constraint when sites increased 08:43:44 if you have 2 and it works, why would you need 1? 08:44:23 the token size will become to large, so you have to limit a project spread in limited region, or each time with scoped token 08:44:37 add to pad 08:45:04 in solution 2 you restrict syncing to keystone database only 08:45:19 not much data (without UUID tokens) 08:45:40 The solution two has not been tested, and also not receive the confirmation from Galera whether it's feasible 08:46:06 with Galera in my understanding you have to deploy the keystone database in its own database server instannce 08:46:08 I would prefer the solution 3 or 6 08:46:33 you cannot share database server with other services because Galera replicates everything 08:47:03 incl e.g. Nova that should not be replicated to other sites 08:48:00 In multisite scenario, I think KeyStone should be deployed with separated database server 08:48:07 meaning if you want to use Galera replication for the keystone database, you have to change most existing deployers 08:48:55 joehuang: yes required using Galera, not with async repl 08:49:45 with async repl you can select what databases should be synced 08:50:52 xiaolong: federation only seems to handle authentication, authorization is the problem 08:51:10 to Xiaolong, the PKI token size will be over size with number of region and service endpoint, I 'll try to find the number of that. it varied according to the length of your service access link 08:52:21 to hafe, how to make the name red 08:53:18 sorry no clue 08:53:21 but for async repl, the revoke will be a risk for the duration of aync-repli 08:54:15 sorry, i make a wrong statement 08:54:35 I have prototyped on 2 as promised 08:54:43 using docker 08:54:51 what's the conclusion 08:55:02 well 08:55:28 first I was just using master slave repl 08:55:33 you mean replication from one cluster to another cluser 08:55:44 I did not have clusters 08:55:56 each "region" had a single db server 08:56:27 each slave region replicated keystone db and did local token validation 08:56:47 what's the replication latency 08:56:49 basically our high level req was OK 08:57:00 since I used Fernet 08:57:15 data is only replicated at startup basically 08:57:17 did you try revoke? 08:57:20 no 08:57:39 how about adding a new user 08:57:55 in LDAP 08:58:05 ahaa 08:58:28 The slave is also working online ? 08:58:34 yes 08:58:40 in pallale with the master 08:58:43 since it is supposed to be read only 08:58:51 yes 08:58:55 scale out for read only 08:59:19 Xiaolong, how about hafe's test? 08:59:28 It's good 08:59:47 but we can have the master as a cluster to have higher availability 08:59:53 then I tried the same thing with a galera cluster as "master region", worked fine syncing to a singleslave 09:00:08 how about multi-slave 09:00:12 but when the "slave" is a galera cluster itself 09:00:19 this is great if it works 09:00:29 I haven't got it to work yet 09:00:44 it should work according blogs I have read 09:00:57 Let's continue next time. Hope a very good result next week 09:01:39 Time flies, let's keep discussion in m-l 09:02:00 sure I can update with prototype status 09:02:19 and I also posted some data about monitoring, hope you can give some feedback. 09:02:36 Thanks a lot for all 09:02:52 and especially the prototype from hafe 09:03:06 I like hafe's idea :) 09:03:06 :-) 09:03:24 something great :) 09:03:28 okey folks let's keep discussion in the mailing list 09:03:36 #endmeeting