======================================= #acumos-meeting: Architecture Committee ======================================= Meeting started by farheen_att at 14:05:55 UTC. The full logs are available at http://ircbot.wl.linuxfoundation.org/meetings/acumos-meeting/2019/acumos-meeting.2019-09-25-14.05.log.html . Meeting summary --------------- * Agenda (farheen_att, 14:07:23) * ML Workbench, DNS compliance issue, Kafka pivot (farheen_att, 14:12:30) * microservice generation impact and the overall design (farheen_att, 14:13:05) * Guy sharing his bullet points (farheen_att, 14:13:25) * we also want to merge with Acumos with ATT internal org they use jenkins. (farheen_att, 14:15:22) * LINK: https://wiki.acumos.org/pages/viewpage.action?pageId=26640755 (farheen_att, 14:16:01) * This is planned for Demeter release not Clio. There will be code refactoring ms generation. (farheen_att, 14:17:48) * which other components will be impacted? (farheen_att, 14:17:58) * onboarding and portal have to be modified to call the API in jenkins but everything else should be the same. (farheen_att, 14:18:30) * Tausif - This effected module needs to call jenkins API. What point in time does it need to be called? (farheen_att, 14:19:10) * Guy - Right now in both onboarding and portal there are times when you call the ms generation api. Instead of calling the ms-generation api you call jenkins api. Minor modifications. You have to know where the jenkins api is and the ms-generation api will be turned off. (farheen_att, 14:20:17) * we will continue to perform outbound calls will be handled by the runnable jars and look the same. (farheen_att, 14:20:53) * solution id, revision id, and the user id will be the parameters to deploy or scan a model. (farheen_att, 14:21:23) * Tausif - Will this be at the time of onboarding a model. (farheen_att, 14:21:35) * Guy yes, if you create a ms. The portal will call the ms-generation api which will be changed to jenkins. (farheen_att, 14:22:12) * Tausif generation of ms is coupled with onboarding. We are taking it away from onboarding and using jenkins? (farheen_att, 14:22:36) * Guy - It's it's own module and can be invoked from on-boarding. You don't have to involve the portal at all. If you do use the portal to do web onboarding then you will have to call jenkins. (farheen_att, 14:23:19) * Tausif how will we get the information back? (farheen_att, 14:23:41) * Exactly the same as today. There is no impact on that part. (farheen_att, 14:23:57) * Will we get a performance gain? (farheen_att, 14:24:13) * No, the speed of ms generation is bounded by the docker host for fetching (farheen_att, 14:24:43) * why are we doing this? (farheen_att, 14:24:50) * docker and docker requires root level access and it doesn't work in Azure. (farheen_att, 14:25:20) * docker in docker not docker and docker (farheen_att, 14:25:44) * scaling is an additional benefit. (farheen_att, 14:26:44) * Is there failure handling? Jenkins stops working after some time for no reason. We need to re-trigger the job. (farheen_att, 14:27:14) * Guy - Logging for the build process is available. We have not addressed it but something to consider. We create the logs and see what goes wrong. (farheen_att, 14:28:12) * proper error handling should be put forward with the proper flow. At the time of user stories we can add the right flows. (farheen_att, 14:28:40) * Prya - docker image size can that be optimized by jenkins? (farheen_att, 14:30:20) * we can brainstorm and see at the time of build. What is the target environment and can it be built in that environment? (farheen_att, 14:31:05) * it's not the size of the kernal that's killing us. The model does have. (farheen_att, 14:33:20) * Sayee - Concerned about docker size (farheen_att, 14:34:40) * It's an orthogonal issue building a faster smaller docker image. (farheen_att, 14:35:00) * irrespective of docker or not. Inside the ms generation should we optimize or not? We can work it in parallel. (farheen_att, 14:35:42) * ML Workbech modeler user experience (farheen_att, 14:36:32) * Guy - first thing is if we can't keep the acumos internal views in synch with jupyter notebooks then it won't add valuable to integrate with acumos. Changing a notebook will be lost. (farheen_att, 14:38:11) * If I noticed these notebooks were not in synch I would ignore the models. So having Acumos keep track of these I would not find it valuable. (farheen_att, 14:39:04) * I would ignore the models in Acumos. (farheen_att, 14:39:40) * Bryan - Use git as a back end and synchronize. (farheen_att, 14:40:07) * Sayee agrees that there should be tighter integration. (farheen_att, 14:40:56) * There is no reason that I would used the Notebook. Modeler credentials being pre-loaded would be helpful. I want to just do a push. It knows who I am. (farheen_att, 14:41:45) * I would also like to see more client libraries. CMLP has shell access which is good. You can't do that with the ML libs in jupyter. (farheen_att, 14:42:25) * Bryan - URIs and credentials are easily available. (farheen_att, 14:42:46) * Guy- this is an easy fix. (farheen_att, 14:42:59) * Bryan - It is fixed. It is easy to do. (farheen_att, 14:43:13) * Sayee- As we evolve the advantage is providing GPUs as the platform so the GPU can be attached on as needed basis. (farheen_att, 14:43:47) * This is the system that I tested IST. Sharing of the code is there. (farheen_att, 14:44:24) * Guy - we need persistence of user so that when notebooks crash you don't lose everything. (farheen_att, 14:44:45) * Guy - Bryan uses persistent volumes and git these are good solutions. (farheen_att, 14:45:13) * Guy resource concern. We are pulling and running all the containers. (farheen_att, 14:45:57) * Sharing is great when you look at RCloud to share code. It would be nice to have a community of coders. (farheen_att, 14:46:31) * Bryan - It's easy to add packages inside the notebook using the Python command. (farheen_att, 14:47:21) * A shell window would be nice. (farheen_att, 14:47:30) * Jupyter does the same thing. (farheen_att, 14:47:38) * Bryan - If we want to build our own customzied jupyter stacked images we can do so. (farheen_att, 14:48:07) * add this to the etherpad. (farheen_att, 14:50:01) * ACTION: Manoop - add this link to the etherpad. (farheen_att, 14:51:00) * LINK: https://wiki.acumos.org/display/AR/Thoughts+on+ML+Workbench+from+a+Modeler%27s+Perspective (farheen_att, 14:51:36) * Bring your ideas to face to face meeting. (farheen_att, 14:52:41) * LINK: https://etherpad.acumos.org/p/DemeterPlanningWorkshop (farheen_att, 14:53:36) * High level overview of Acumos-2901 (farheen_att, 14:54:14) * Parag when you are trying to deploy the model in k8 environment it will fail because it doesn't accept characters. (farheen_att, 14:55:54) * it's the name of the container of the pod. every pod is a domain name inside the cluster. The name of the model can not have DNS compliant. (farheen_att, 14:56:38) * we should not restrict the user from creating a name in DNS acceptable format but change the model name. (farheen_att, 14:58:12) * when the user is onboarding we should check the model meta data such as name. (farheen_att, 14:59:03) * rather than restricting the user we should automatically generate the name. (farheen_att, 14:59:31) * we need a friendly name a DNS compliant name. A short term solution is to restrict the user until it has been fixed. (farheen_att, 15:00:49) * ACTION: Guy lead the effort to accept a DNS generated name. (farheen_att, 15:04:11) * Guy - there is code in there already to do that. (farheen_att, 15:05:19) * Priya - have a friendly name and a system name. (farheen_att, 15:05:54) * everyone agrees (farheen_att, 15:06:03) * ACTION: Parag convert Acumos-2901 from an Issue into a User Story in Demeter in jira. (farheen_att, 15:07:08) * ML Workbench status (farheen_att, 15:08:42) * SV and deployment will not have a final docker by the end of this week. Minimum the end of next week. (farheen_att, 15:12:06) * there may be a gap in how to deploy. (farheen_att, 15:13:05) * we will figure it out in the integration cycle. (farheen_att, 15:13:18) * if development is ready then testing can start. If not then issue should be raised. (farheen_att, 15:13:38) Meeting ended at 15:14:07 UTC. People present (lines said) --------------------------- * farheen_att (81) * collabot` (3) Generated by `MeetBot`_ 0.1.4