14:05:55 <farheen_att> #startmeeting Architecture Committee 14:05:55 <collabot`> Meeting started Wed Sep 25 14:05:55 2019 UTC. The chair is farheen_att. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:05:55 <collabot`> Useful Commands: #action #agreed #help #info #idea #link #topic. 14:05:55 <collabot`> The meeting name has been set to 'architecture_committee' 14:07:23 <farheen_att> #topic Agenda 14:12:30 <farheen_att> #info ML Workbench, DNS compliance issue, Kafka pivot 14:13:05 <farheen_att> #topic microservice generation impact and the overall design 14:13:25 <farheen_att> #info Guy sharing his bullet points 14:14:53 <farheen_att> #linfo we want to use jenkins to build our core system because of the issue of docker in docker. If your docker host is in a docker it leads to problems. 14:15:22 <farheen_att> #info we also want to merge with Acumos with ATT internal org they use jenkins. 14:16:01 <farheen_att> #link https://wiki.acumos.org/pages/viewpage.action?pageId=26640755 14:17:48 <farheen_att> #info This is planned for Demeter release not Clio. There will be code refactoring ms generation. 14:17:58 <farheen_att> #info which other components will be impacted? 14:18:30 <farheen_att> #info onboarding and portal have to be modified to call the API in jenkins but everything else should be the same. 14:19:10 <farheen_att> #info Tausif - This effected module needs to call jenkins API. What point in time does it need to be called? 14:20:17 <farheen_att> #info Guy - Right now in both onboarding and portal there are times when you call the ms generation api. Instead of calling the ms-generation api you call jenkins api. Minor modifications. You have to know where the jenkins api is and the ms-generation api will be turned off. 14:20:53 <farheen_att> #info we will continue to perform outbound calls will be handled by the runnable jars and look the same. 14:21:23 <farheen_att> #info solution id, revision id, and the user id will be the parameters to deploy or scan a model. 14:21:35 <farheen_att> #info Tausif - Will this be at the time of onboarding a model. 14:22:12 <farheen_att> #info Guy yes, if you create a ms. The portal will call the ms-generation api which will be changed to jenkins. 14:22:36 <farheen_att> #info Tausif generation of ms is coupled with onboarding. We are taking it away from onboarding and using jenkins? 14:23:19 <farheen_att> #info Guy - It's it's own module and can be invoked from on-boarding. You don't have to involve the portal at all. If you do use the portal to do web onboarding then you will have to call jenkins. 14:23:41 <farheen_att> #info Tausif how will we get the information back? 14:23:57 <farheen_att> #info Exactly the same as today. There is no impact on that part. 14:24:13 <farheen_att> #info Will we get a performance gain? 14:24:43 <farheen_att> #info No, the speed of ms generation is bounded by the docker host for fetching 14:24:50 <farheen_att> #info why are we doing this? 14:25:20 <farheen_att> #info docker and docker requires root level access and it doesn't work in Azure. 14:25:44 <farheen_att> #info docker in docker not docker and docker 14:26:44 <farheen_att> #info scaling is an additional benefit. 14:27:14 <farheen_att> #info Is there failure handling? Jenkins stops working after some time for no reason. We need to re-trigger the job. 14:28:12 <farheen_att> #info Guy - Logging for the build process is available. We have not addressed it but something to consider. We create the logs and see what goes wrong. 14:28:40 <farheen_att> #info proper error handling should be put forward with the proper flow. At the time of user stories we can add the right flows. 14:30:20 <farheen_att> #info Prya - docker image size can that be optimized by jenkins? 14:31:05 <farheen_att> #info we can brainstorm and see at the time of build. What is the target environment and can it be built in that environment? 14:33:20 <farheen_att> #info it's not the size of the kernal that's killing us. The model does have. 14:34:40 <farheen_att> #info Sayee - Concerned about docker size 14:35:00 <farheen_att> #info It's an orthogonal issue building a faster smaller docker image. 14:35:42 <farheen_att> #info irrespective of docker or not. Inside the ms generation should we optimize or not? We can work it in parallel. 14:36:32 <farheen_att> #topic ML Workbech modeler user experience 14:38:11 <farheen_att> #info Guy - first thing is if we can't keep the acumos internal views in synch with jupyter notebooks then it won't add valuable to integrate with acumos. Changing a notebook will be lost. 14:39:04 <farheen_att> #info If I noticed these notebooks were not in synch I would ignore the models. So having Acumos keep track of these I would not find it valuable. 14:39:40 <farheen_att> #info I would ignore the models in Acumos. 14:40:07 <farheen_att> #info Bryan - Use git as a back end and synchronize. 14:40:56 <farheen_att> #info Sayee agrees that there should be tighter integration. 14:41:45 <farheen_att> #info There is no reason that I would used the Notebook. Modeler credentials being pre-loaded would be helpful. I want to just do a push. It knows who I am. 14:42:25 <farheen_att> #info I would also like to see more client libraries. CMLP has shell access which is good. You can't do that with the ML libs in jupyter. 14:42:46 <farheen_att> #info Bryan - URIs and credentials are easily available. 14:42:59 <farheen_att> #info Guy- this is an easy fix. 14:43:13 <farheen_att> #info Bryan - It is fixed. It is easy to do. 14:43:47 <farheen_att> #info Sayee- As we evolve the advantage is providing GPUs as the platform so the GPU can be attached on as needed basis. 14:44:24 <farheen_att> #info This is the system that I tested IST. Sharing of the code is there. 14:44:45 <farheen_att> #info Guy - we need persistence of user so that when notebooks crash you don't lose everything. 14:45:13 <farheen_att> #info Guy - Bryan uses persistent volumes and git these are good solutions. 14:45:57 <farheen_att> #info Guy resource concern. We are pulling and running all the containers. 14:46:31 <farheen_att> #info Sharing is great when you look at RCloud to share code. It would be nice to have a community of coders. 14:47:21 <farheen_att> #info Bryan - It's easy to add packages inside the notebook using the Python command. 14:47:30 <farheen_att> #info A shell window would be nice. 14:47:38 <farheen_att> #info Jupyter does the same thing. 14:48:07 <farheen_att> #info Bryan - If we want to build our own customzied jupyter stacked images we can do so. 14:50:01 <farheen_att> #info add this to the etherpad. 14:51:00 <farheen_att> #action Manoop - add this link to the etherpad. 14:51:36 <farheen_att> #link https://wiki.acumos.org/display/AR/Thoughts+on+ML+Workbench+from+a+Modeler%27s+Perspective 14:52:41 <farheen_att> #topic Bring your ideas to face to face meeting. 14:53:36 <farheen_att> #link https://etherpad.acumos.org/p/DemeterPlanningWorkshop 14:54:14 <farheen_att> #topic High level overview of Acumos-2901 14:54:52 <farheen_att> Parag - models have to have a name that is DNS compliant 14:55:54 <farheen_att> #info Parag when you are trying to deploy the model in k8 environment it will fail because it doesn't accept characters. 14:56:38 <farheen_att> #info it's the name of the container of the pod. every pod is a domain name inside the cluster. The name of the model can not have DNS compliant. 14:58:12 <farheen_att> #info we should not restrict the user from creating a name in DNS acceptable format but change the model name. 14:59:03 <farheen_att> #info when the user is onboarding we should check the model meta data such as name. 14:59:31 <farheen_att> #info rather than restricting the user we should automatically generate the name. 15:00:49 <farheen_att> #info we need a friendly name a DNS compliant name. A short term solution is to restrict the user until it has been fixed. 15:04:11 <farheen_att> #action Guy lead the effort to accept a DNS generated name. 15:05:19 <farheen_att> #info Guy - there is code in there already to do that. 15:05:54 <farheen_att> #info Priya - have a friendly name and a system name. 15:06:03 <farheen_att> #info everyone agrees 15:07:08 <farheen_att> #action Parag convert Acumos-2901 from an Issue into a User Story in Demeter in jira. 15:08:42 <farheen_att> #topic ML Workbench status 15:12:06 <farheen_att> #info SV and deployment will not have a final docker by the end of this week. Minimum the end of next week. 15:13:05 <farheen_att> #info there may be a gap in how to deploy. 15:13:18 <farheen_att> #info we will figure it out in the integration cycle. 15:13:38 <farheen_att> #info if development is ready then testing can start. If not then issue should be raised. 15:14:07 <farheen_att> #endmeeting