08:00:51 <zhipeng> #startmeeting Multisite Weekly Meeting 2015.07.09 08:00:51 <collabot> Meeting started Thu Jul 9 08:00:51 2015 UTC. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:00:51 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic. 08:00:51 <collabot> The meeting name has been set to 'multisite_weekly_meeting_2015_07_09' 08:01:02 <zhipeng> #topic Roll Call 08:01:08 <zhipeng> hi guys 08:01:16 <Malla> Hi all, IRC only meeting right..? 08:01:19 <zhipeng> card punching time :P 08:01:21 <zhipeng> yep 08:01:48 <Malla> thanks Zhipeng 08:01:58 <joehuang> hi 08:02:00 <sorantis> hey 08:02:03 <zhipeng> no problem Malla 08:02:07 <zhipeng> #info zhipeng 08:02:13 <joehuang> #info joehuang 08:02:18 <sorantis> #info dimitri 08:02:22 <hafe> #info Hans Feldt 08:03:05 <zhipeng> #info Malla 08:03:08 <zhipeng> :P 08:03:16 <joehuang> #topic multisite identity service management 08:03:35 <zhipeng> #topic multisite identity service management 08:03:42 <zhipeng> sorry joe 08:03:43 <joehuang> #link https://etherpad.opnfv.org/p/multisite_identity_management 08:03:53 <joehuang> same idea :) 08:03:57 <hafe> I just updated it 08:04:04 <zhipeng> nice hafe 08:04:22 <joehuang> Thanks, take few minutes to read your update 08:05:11 <hafe> I am making some statements there, would be good to make sure that they are not wrong and we're aligned 08:05:56 <joehuang> "The token contains a list of endpoints" only PKI token contains the endpoint-list 08:06:10 <hafe> is that so? 08:06:16 <joehuang> UUID / Fernet Token don't 08:06:19 <hafe> ok 08:06:50 <hafe> no because they have a fixed size? 08:07:04 <joehuang> UUID is random 32 bytes data, fixed size 08:07:35 <hafe> so depending on token type, the next step could be to request the service catalog 08:08:46 <joehuang> No service catalog will be requested for UUID/Fernet, but send the UUID/Fernet to KeyStone for validation 08:09:56 <hafe> with token type UUID/Fernet, the user request the service catalog 08:11:06 <joehuang> User will request, so that to select a region or not 08:11:39 <joehuang> OK, hafe, I got what you mean 08:12:14 <joehuang> I found a new issue for PKI token 08:12:51 <joehuang> If the certs used in the token validation is revoked, then you have to replicate new certs to all sites 08:14:24 <hafe> yeah PKI/Fernet brings in boring security issues 08:14:50 <joehuang> to hafe: could you explain "PKI are needed with a centralized Keystone to avoid inter region traffic." 08:14:50 <hafe> should add that 08:15:24 <hafe> offline validation for PKI tokens 08:15:36 <hafe> but revocation traffic instead 08:17:04 <joehuang> KeyStone service can be distributed in few sites for DB replication to achieve high availability for PKI token 08:18:08 <joehuang> then the revokcation even in the DB will also be replicated to the these sites, then the revoke-list retrivement can be done not only in one site 08:18:08 <hafe> you mean like asymmetrical replication to a few sites, not all? 08:18:37 <joehuang> It looks like centralized 08:19:20 <xiaolong> cert revocation is not so frequent, isn't that? 08:19:28 <hafe> I don't know 08:19:30 <joehuang> can be sync and symmetrial 08:19:43 <hafe> and not even sure it works for PKI from some video I saw 08:20:27 <joehuang> yes, xiaolon, certs revocate is not so frequent. and some saids it often don't work 08:22:48 <xiaolong> have you taken a look at the CERN use case (keystone federation)? https://blueprints.launchpad.net/keystone/+spec/keystone-to-keystone-federation 08:22:50 <hafe> sorry I mean token revocation lists 08:23:00 <hafe> tokens can be revoked in the API 08:23:11 <xiaolong> how about the maturity of their proposal? 08:24:13 <hafe> my understanding is that federation is only for authentication 08:24:46 <joehuang> almost all new features in Kilo( last version ) of KeyStone is about keystone federation : ) 08:25:48 <joehuang> federation is mainly for two cloud provider to borrow/rent resources 08:26:11 <hafe> and we are not in that business 08:26:12 <zhipeng> but keystone federation is about authentication right? 08:26:24 <joehuang> so a lot of role/user/domain/group mapping and configuration has to be done 08:27:04 <hafe> what token type does CERN use? 08:27:19 <hafe> joehuang: yes that is my understanding 08:27:21 <joehuang> federated authentication: if you auth-ed in one cloud, you can access another partner cloud 08:28:10 <xiaolong> yes, there are some difference between the two use cases, but maybe the technical solutions may be inspired 08:28:14 <hafe> do you find anything wrong or something you don't agree with? 08:28:29 <xiaolong> such as the format of token 08:29:33 <joehuang> I don't know what type of token CERN is using 08:30:25 <joehuang> I'll try to find out what type token CERN is using 08:30:32 <hafe> maybe the use token 08:30:38 <hafe> I found a blog 08:30:58 <hafe> kerberos 08:31:39 <hafe> http://openstack-in-production.blogspot.se/2014/10/kerberos-and-single-sign-on-with.html 08:31:39 <joehuang> To hafe, "Multi master synchronous: Galera (others?), not very scalable" it's scalable 08:32:01 <hafe> to som extent 08:32:34 <hafe> joehuang: how would you phrase it? 08:33:04 <hafe> galera uses virtual synchrony as a protocol 08:33:09 <joehuang> #link http://indico.cern.ch/event/283833/contribution/0/attachments/523615/722236/summitHK2013_All_v2.pdf 08:33:19 <joehuang> CERN is using PKI token 08:33:48 <hafe> from what I know such protocol requires pretty good deterministic inter node links 08:34:24 <hafe> and configured timeouts etc 08:34:24 <joehuang> The guy from Galera confirmed that they see pratice to use Galera for 15 nodes, distributed in 5 data centers 08:34:40 <xiaolong> another point, personally, I am not fan of database replication across multi-site 08:35:15 <joehuang> Then, the better choice would be PKI 08:36:28 <hafe> on the big picture it is not clear where the arch border line of opnfv is 08:36:57 <joehuang> yes, agree 08:37:11 <joehuang> OPNFV need a big picture 08:37:15 <hafe> should opnfv deliver the multisite requirement out of the box? 08:37:42 <hafe> multisite identity requirement 08:37:43 <zhipeng> that would be the ideal goal i think 08:37:49 <joehuang> But we have to confront such kind of multisite issue if we put OpenStack into production 08:39:22 <hafe> the high level goal I suggested: "a user should, using a single authentication point be able to manage virtual resources spread over multiple OpenStack regions" 08:39:31 <hafe> do we agree on that? 08:40:01 <xiaolong> I agree with this expression 08:40:13 <zhipeng> agree 08:40:18 <joehuang> agree 08:40:29 <zhipeng> #agreed "a user should, using a single authentication point be able to manage virtual resources spread over multiple OpenStack regions" 08:40:29 <joehuang> changed in the etherpad 08:40:52 <zhipeng> so should we settle on solution 2, for this use case then ? 08:41:48 <hafe> you mean the async repl idea? 08:41:56 <joehuang> It's up to the number of regions for whhich solution is better 08:42:17 <hafe> and that becomes a pain for opnfv to magically support 08:42:37 <joehuang> For solution, only if there is lots of sites( exceed the cluster capability) 08:42:52 <joehuang> For solution 2, only if there is lots of sites( exceed the cluster capability) 08:43:33 <joehuang> For PKI token, there is some constraint when sites increased 08:43:44 <hafe> if you have 2 and it works, why would you need 1? 08:44:23 <joehuang> the token size will become to large, so you have to limit a project spread in limited region, or each time with scoped token 08:44:37 <hafe> add to pad 08:45:04 <hafe> in solution 2 you restrict syncing to keystone database only 08:45:19 <hafe> not much data (without UUID tokens) 08:45:40 <joehuang> The solution two has not been tested, and also not receive the confirmation from Galera whether it's feasible 08:46:06 <hafe> with Galera in my understanding you have to deploy the keystone database in its own database server instannce 08:46:08 <xiaolong> I would prefer the solution 3 or 6 08:46:33 <hafe> you cannot share database server with other services because Galera replicates everything 08:47:03 <hafe> incl e.g. Nova that should not be replicated to other sites 08:48:00 <joehuang> In multisite scenario, I think KeyStone should be deployed with separated database server 08:48:07 <hafe> meaning if you want to use Galera replication for the keystone database, you have to change most existing deployers 08:48:55 <hafe> joehuang: yes required using Galera, not with async repl 08:49:45 <hafe> with async repl you can select what databases should be synced 08:50:52 <hafe> xiaolong: federation only seems to handle authentication, authorization is the problem 08:51:10 <joehuang> to Xiaolong, the PKI token size will be over size with number of region and service endpoint, I 'll try to find the number of that. it varied according to the length of your service access link 08:52:21 <joehuang> to hafe, how to make the name red 08:53:18 <hafe> sorry no clue 08:53:21 <joehuang> but for async repl, the revoke will be a risk for the duration of aync-repli 08:54:15 <joehuang> sorry, i make a wrong statement 08:54:35 <hafe> I have prototyped on 2 as promised 08:54:43 <hafe> using docker 08:54:51 <joehuang> what's the conclusion 08:55:02 <hafe> well 08:55:28 <hafe> first I was just using master slave repl 08:55:33 <joehuang> you mean replication from one cluster to another cluser 08:55:44 <hafe> I did not have clusters 08:55:56 <hafe> each "region" had a single db server 08:56:27 <hafe> each slave region replicated keystone db and did local token validation 08:56:47 <joehuang> what's the replication latency 08:56:49 <hafe> basically our high level req was OK 08:57:00 <hafe> since I used Fernet 08:57:15 <hafe> data is only replicated at startup basically 08:57:17 <joehuang> did you try revoke? 08:57:20 <hafe> no 08:57:39 <joehuang> how about adding a new user 08:57:55 <hafe> in LDAP 08:58:05 <joehuang> ahaa 08:58:28 <joehuang> The slave is also working online ? 08:58:34 <hafe> yes 08:58:40 <joehuang> in pallale with the master 08:58:43 <hafe> since it is supposed to be read only 08:58:51 <joehuang> yes 08:58:55 <hafe> scale out for read only 08:59:19 <joehuang> Xiaolong, how about hafe's test? 08:59:28 <joehuang> It's good 08:59:47 <joehuang> but we can have the master as a cluster to have higher availability 08:59:53 <hafe> then I tried the same thing with a galera cluster as "master region", worked fine syncing to a singleslave 09:00:08 <joehuang> how about multi-slave 09:00:12 <hafe> but when the "slave" is a galera cluster itself 09:00:19 <joehuang> this is great if it works 09:00:29 <hafe> I haven't got it to work yet 09:00:44 <hafe> it should work according blogs I have read 09:00:57 <joehuang> Let's continue next time. Hope a very good result next week 09:01:39 <joehuang> Time flies, let's keep discussion in m-l 09:02:00 <hafe> sure I can update with prototype status 09:02:19 <joehuang> and I also posted some data about monitoring, hope you can give some feedback. 09:02:36 <joehuang> Thanks a lot for all 09:02:52 <joehuang> and especially the prototype from hafe 09:03:06 <zhipeng> I like hafe's idea :) 09:03:06 <hafe> :-) 09:03:24 <joehuang> something great :) 09:03:28 <zhipeng> okey folks let's keep discussion in the mailing list 09:03:36 <zhipeng> #endmeeting