08:00:51 <zhipeng> #startmeeting Multisite Weekly Meeting 2015.07.09
08:00:51 <collabot> Meeting started Thu Jul  9 08:00:51 2015 UTC.  The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:00:51 <collabot> Useful Commands: #action #agreed #help #info #idea #link #topic.
08:00:51 <collabot> The meeting name has been set to 'multisite_weekly_meeting_2015_07_09'
08:01:02 <zhipeng> #topic Roll Call
08:01:08 <zhipeng> hi guys
08:01:16 <Malla> Hi all, IRC only meeting right..?
08:01:19 <zhipeng> card punching time :P
08:01:21 <zhipeng> yep
08:01:48 <Malla> thanks Zhipeng
08:01:58 <joehuang> hi
08:02:00 <sorantis> hey
08:02:03 <zhipeng> no problem Malla
08:02:07 <zhipeng> #info zhipeng
08:02:13 <joehuang> #info joehuang
08:02:18 <sorantis> #info dimitri
08:02:22 <hafe> #info Hans Feldt
08:03:05 <zhipeng> #info Malla
08:03:08 <zhipeng> :P
08:03:16 <joehuang> #topic  multisite identity service management
08:03:35 <zhipeng> #topic  multisite identity service management
08:03:42 <zhipeng> sorry joe
08:03:43 <joehuang> #link https://etherpad.opnfv.org/p/multisite_identity_management
08:03:53 <joehuang> same idea :)
08:03:57 <hafe> I just updated it
08:04:04 <zhipeng> nice hafe
08:04:22 <joehuang> Thanks, take few minutes to read your update
08:05:11 <hafe> I am making some statements there, would be good to make sure that they are not wrong and we're aligned
08:05:56 <joehuang> "The token contains a list of endpoints" only PKI token contains the endpoint-list
08:06:10 <hafe> is that so?
08:06:16 <joehuang> UUID / Fernet Token don't
08:06:19 <hafe> ok
08:06:50 <hafe> no because they have a fixed size?
08:07:04 <joehuang> UUID is random 32 bytes data, fixed size
08:07:35 <hafe> so depending on token type, the next step could be to request the service catalog
08:08:46 <joehuang> No service catalog will be requested for UUID/Fernet, but send the UUID/Fernet to KeyStone for validation
08:09:56 <hafe> with token type UUID/Fernet, the user request the service catalog
08:11:06 <joehuang> User will request, so that to select a region or not
08:11:39 <joehuang> OK, hafe, I got what you mean
08:12:14 <joehuang> I found a new issue for PKI token
08:12:51 <joehuang> If the certs used in the token validation is revoked, then you have to replicate new certs to all sites
08:14:24 <hafe> yeah PKI/Fernet brings in boring security issues
08:14:50 <joehuang> to hafe: could you explain "PKI are needed with a centralized Keystone to avoid inter region traffic."
08:14:50 <hafe> should add that
08:15:24 <hafe> offline validation for PKI tokens
08:15:36 <hafe> but revocation traffic instead
08:17:04 <joehuang> KeyStone service can be distributed in few sites for DB replication to achieve high availability for PKI token
08:18:08 <joehuang> then the revokcation even in the DB will also be replicated to the these sites, then the revoke-list retrivement can be done not only in one site
08:18:08 <hafe> you mean like asymmetrical replication to a few sites, not all?
08:18:37 <joehuang> It looks like centralized
08:19:20 <xiaolong> cert revocation is not so frequent, isn't that?
08:19:28 <hafe> I don't know
08:19:30 <joehuang> can be sync and symmetrial
08:19:43 <hafe> and not even sure it works for PKI from some video I saw
08:20:27 <joehuang> yes, xiaolon, certs revocate is not so frequent. and some saids it often don't work
08:22:48 <xiaolong> have you taken a look at the CERN use case (keystone federation)? https://blueprints.launchpad.net/keystone/+spec/keystone-to-keystone-federation
08:22:50 <hafe> sorry I mean token revocation lists
08:23:00 <hafe> tokens can be revoked in the API
08:23:11 <xiaolong> how about the maturity of their proposal?
08:24:13 <hafe> my understanding is that federation is only for authentication
08:24:46 <joehuang> almost all new features in Kilo( last version ) of KeyStone is about keystone federation : )
08:25:48 <joehuang> federation is mainly for two cloud provider to borrow/rent resources
08:26:11 <hafe> and we are not in that business
08:26:12 <zhipeng> but keystone federation is about authentication right?
08:26:24 <joehuang> so a lot of  role/user/domain/group mapping and configuration has to be done
08:27:04 <hafe> what token type does CERN use?
08:27:19 <hafe> joehuang: yes that is my understanding
08:27:21 <joehuang> federated authentication: if you auth-ed in one cloud, you can access another partner cloud
08:28:10 <xiaolong> yes, there are some difference between the two use cases, but maybe the technical solutions may be inspired
08:28:14 <hafe> do you find anything wrong or something you don't agree with?
08:28:29 <xiaolong> such as the format of token
08:29:33 <joehuang> I don't know what type of token CERN is using
08:30:25 <joehuang> I'll try to find out what type token CERN is using
08:30:32 <hafe> maybe the use token
08:30:38 <hafe> I found a blog
08:30:58 <hafe> kerberos
08:31:39 <hafe> http://openstack-in-production.blogspot.se/2014/10/kerberos-and-single-sign-on-with.html
08:31:39 <joehuang> To hafe, "Multi master synchronous: Galera (others?), not very scalable" it's scalable
08:32:01 <hafe> to som extent
08:32:34 <hafe> joehuang: how would you phrase it?
08:33:04 <hafe> galera uses virtual synchrony as a protocol
08:33:09 <joehuang> #link http://indico.cern.ch/event/283833/contribution/0/attachments/523615/722236/summitHK2013_All_v2.pdf
08:33:19 <joehuang> CERN is using PKI token
08:33:48 <hafe> from what I know such protocol requires pretty good deterministic inter node links
08:34:24 <hafe> and configured timeouts etc
08:34:24 <joehuang> The guy from Galera confirmed that they see pratice to use Galera for 15 nodes, distributed in 5 data centers
08:34:40 <xiaolong> another point, personally, I am not fan of database replication across multi-site
08:35:15 <joehuang> Then, the better choice would be PKI
08:36:28 <hafe> on the big picture it is not clear where the arch border line of opnfv is
08:36:57 <joehuang> yes, agree
08:37:11 <joehuang> OPNFV need a big picture
08:37:15 <hafe> should opnfv deliver the multisite requirement out of the box?
08:37:42 <hafe> multisite identity requirement
08:37:43 <zhipeng> that would be the ideal goal i think
08:37:49 <joehuang> But we have to confront such kind of multisite issue if we put OpenStack into production
08:39:22 <hafe> the high level goal I suggested: "a user should, using a single authentication point be able to manage virtual resources spread over  multiple OpenStack regions"
08:39:31 <hafe> do we agree on that?
08:40:01 <xiaolong> I agree with this expression
08:40:13 <zhipeng> agree
08:40:18 <joehuang> agree
08:40:29 <zhipeng> #agreed  "a user should, using a single authentication point be able to manage virtual resources spread over  multiple OpenStack regions"
08:40:29 <joehuang> changed in the etherpad
08:40:52 <zhipeng> so should we settle on solution 2, for this use case then ?
08:41:48 <hafe> you mean the async repl idea?
08:41:56 <joehuang> It's up to the number of regions for whhich solution is better
08:42:17 <hafe> and that becomes a pain for opnfv to magically support
08:42:37 <joehuang> For solution, only if there is lots of sites( exceed the cluster capability)
08:42:52 <joehuang> For solution 2, only if there is lots of sites( exceed the cluster capability)
08:43:33 <joehuang> For PKI token, there is some constraint when sites increased
08:43:44 <hafe> if you have 2 and it works, why would you need 1?
08:44:23 <joehuang> the token size will become to large, so you have to limit a project spread in limited region, or each time with scoped token
08:44:37 <hafe> add to pad
08:45:04 <hafe> in solution 2 you restrict syncing to keystone database only
08:45:19 <hafe> not much data (without UUID tokens)
08:45:40 <joehuang> The solution two has not been tested, and also not receive the confirmation from Galera whether it's feasible
08:46:06 <hafe> with Galera in my understanding you have to deploy the keystone database in its own database server instannce
08:46:08 <xiaolong> I would prefer the solution 3 or 6
08:46:33 <hafe> you cannot share database server with other services because Galera replicates everything
08:47:03 <hafe> incl e.g. Nova that should not be replicated to other sites
08:48:00 <joehuang> In multisite scenario, I think KeyStone should be deployed with separated database server
08:48:07 <hafe> meaning if you want to use Galera replication for the keystone database, you have to change most existing deployers
08:48:55 <hafe> joehuang: yes required using Galera, not with async repl
08:49:45 <hafe> with async repl you can select what databases should be synced
08:50:52 <hafe> xiaolong: federation only seems to handle authentication, authorization is the problem
08:51:10 <joehuang> to Xiaolong, the PKI token size will be over size with number of region and service endpoint, I 'll try to find the number of that. it varied according to the length of your service access link
08:52:21 <joehuang> to hafe, how to make the name red
08:53:18 <hafe> sorry no clue
08:53:21 <joehuang> but for async repl, the revoke will be a risk for the duration of aync-repli
08:54:15 <joehuang> sorry, i make a wrong statement
08:54:35 <hafe> I have prototyped on 2 as promised
08:54:43 <hafe> using docker
08:54:51 <joehuang> what's the conclusion
08:55:02 <hafe> well
08:55:28 <hafe> first I was just using master slave repl
08:55:33 <joehuang> you mean replication from one cluster to another cluser
08:55:44 <hafe> I did not have clusters
08:55:56 <hafe> each "region" had a single db server
08:56:27 <hafe> each slave region replicated keystone db and did local token validation
08:56:47 <joehuang> what's the replication latency
08:56:49 <hafe> basically our high level req was OK
08:57:00 <hafe> since I used Fernet
08:57:15 <hafe> data is only replicated at startup basically
08:57:17 <joehuang> did you try revoke?
08:57:20 <hafe> no
08:57:39 <joehuang> how about adding a new user
08:57:55 <hafe> in LDAP
08:58:05 <joehuang> ahaa
08:58:28 <joehuang> The slave is also working online ?
08:58:34 <hafe> yes
08:58:40 <joehuang> in pallale with the master
08:58:43 <hafe> since it is supposed to be read only
08:58:51 <joehuang> yes
08:58:55 <hafe> scale out for read only
08:59:19 <joehuang> Xiaolong, how about hafe's test?
08:59:28 <joehuang> It's good
08:59:47 <joehuang> but we can have the master as a cluster to have higher availability
08:59:53 <hafe> then I tried the same thing with a galera cluster as "master region", worked fine syncing to a singleslave
09:00:08 <joehuang> how about multi-slave
09:00:12 <hafe> but when the "slave" is a galera cluster itself
09:00:19 <joehuang> this is great if it works
09:00:29 <hafe> I haven't got it to work yet
09:00:44 <hafe> it should work according blogs I have read
09:00:57 <joehuang> Let's continue next time. Hope a very good result next week
09:01:39 <joehuang> Time flies, let's keep discussion in m-l
09:02:00 <hafe> sure I can update with prototype status
09:02:19 <joehuang> and I also posted some data about monitoring, hope you can give some feedback.
09:02:36 <joehuang> Thanks a lot for all
09:02:52 <joehuang> and especially the prototype from hafe
09:03:06 <zhipeng> I like hafe's idea :)
09:03:06 <hafe> :-)
09:03:24 <joehuang> something great :)
09:03:28 <zhipeng> okey folks let's keep discussion in the mailing list
09:03:36 <zhipeng> #endmeeting