#acumos-meeting log

14:05:55 <farheen_att> #startmeeting Architecture Committee
14:05:55 <collabot`> Meeting started Wed Sep 25 14:05:55 2019 UTC.  The chair is farheen_att. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:05:55 <collabot`> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:05:55 <collabot`> The meeting name has been set to 'architecture_committee'
14:07:23 <farheen_att> #topic Agenda
14:12:30 <farheen_att> #info ML Workbench, DNS compliance issue, Kafka pivot
14:13:05 <farheen_att> #topic microservice generation impact and the overall design
14:13:25 <farheen_att> #info Guy sharing his bullet points
14:14:53 <farheen_att> #linfo we want to use jenkins to build our core system because of the issue of docker in docker.  If your docker host is in a docker it leads to problems.
14:15:22 <farheen_att> #info we also want to merge with Acumos with ATT internal org they use jenkins.
14:16:01 <farheen_att> #link https://wiki.acumos.org/pages/viewpage.action?pageId=26640755
14:17:48 <farheen_att> #info This is planned for Demeter release not Clio.  There will be code refactoring ms generation.
14:17:58 <farheen_att> #info which other components will be impacted?
14:18:30 <farheen_att> #info onboarding and portal have to be modified to call the API in jenkins but everything else should be the same.
14:19:10 <farheen_att> #info Tausif - This effected module needs to call jenkins API.  What point in time does it need to be called?
14:20:17 <farheen_att> #info Guy - Right now in both onboarding and portal there are times when you call the ms generation api.  Instead of calling the ms-generation api you call jenkins api.  Minor modifications.  You have to know where the jenkins api is and the ms-generation api will be turned off.
14:20:53 <farheen_att> #info we will continue to perform outbound calls will be handled by the runnable jars and look the same.
14:21:23 <farheen_att> #info solution id, revision id, and the user id will be the parameters to deploy or scan a model.
14:21:35 <farheen_att> #info Tausif - Will this be at the time of onboarding a model.
14:22:12 <farheen_att> #info Guy yes, if you create a ms.  The portal will call the ms-generation api which will be changed to jenkins.
14:22:36 <farheen_att> #info Tausif generation of ms is coupled with onboarding.  We are taking it away from onboarding and using jenkins?
14:23:19 <farheen_att> #info Guy - It's it's own module and can be invoked from on-boarding.  You don't have to involve the portal at all.  If you do use the portal to do web onboarding then you will have to call jenkins.
14:23:41 <farheen_att> #info Tausif how will we get the information back?
14:23:57 <farheen_att> #info Exactly the same as today.  There is no impact on that part.
14:24:13 <farheen_att> #info Will we get a performance gain?
14:24:43 <farheen_att> #info No, the speed of ms generation is bounded by the docker host for fetching
14:24:50 <farheen_att> #info why are we doing this?
14:25:20 <farheen_att> #info docker and docker requires root level access and it doesn't work in Azure.
14:25:44 <farheen_att> #info docker in docker not docker and docker
14:26:44 <farheen_att> #info scaling is an additional benefit.
14:27:14 <farheen_att> #info Is there failure handling?  Jenkins stops working after some time for no reason.  We need to re-trigger the job.
14:28:12 <farheen_att> #info Guy - Logging for the build process is available.  We have not addressed it but something to consider.  We create the logs and see what goes wrong.
14:28:40 <farheen_att> #info proper error handling should be put forward with the proper flow.  At the time of user stories we can add the right flows.
14:30:20 <farheen_att> #info Prya - docker image size can that be optimized by jenkins?
14:31:05 <farheen_att> #info we can brainstorm and see at the time of build.  What is the target environment and can it be built in that environment?
14:33:20 <farheen_att> #info it's not the size of the kernal that's killing us.  The model does have.
14:34:40 <farheen_att> #info Sayee - Concerned about docker size
14:35:00 <farheen_att> #info It's an orthogonal issue building a faster smaller docker image.
14:35:42 <farheen_att> #info irrespective of docker or not.  Inside the ms generation should we optimize or not?  We can work it in parallel.
14:36:32 <farheen_att> #topic ML Workbech modeler user experience
14:38:11 <farheen_att> #info Guy - first thing is if we can't keep the acumos internal views in synch with jupyter notebooks then it won't add valuable to integrate with acumos.  Changing a notebook will be lost.
14:39:04 <farheen_att> #info If I noticed these notebooks were not in synch I would ignore the models.  So having Acumos keep track of these I would not find it valuable.
14:39:40 <farheen_att> #info I would ignore the models in Acumos.
14:40:07 <farheen_att> #info Bryan - Use git as a back end and synchronize.
14:40:56 <farheen_att> #info Sayee agrees that there should be tighter integration.
14:41:45 <farheen_att> #info There is no reason that I would used the Notebook.  Modeler credentials being pre-loaded would be helpful.  I want to just do a push.  It knows who I am.
14:42:25 <farheen_att> #info I would also like to see more client libraries.  CMLP has shell access which is good.  You can't do that with the ML libs in jupyter.
14:42:46 <farheen_att> #info Bryan - URIs and credentials are easily available.
14:42:59 <farheen_att> #info Guy- this is an easy fix.
14:43:13 <farheen_att> #info Bryan - It is fixed.  It is easy to do.
14:43:47 <farheen_att> #info Sayee- As we evolve the advantage is providing GPUs as the platform so the GPU can be attached on as needed basis.
14:44:24 <farheen_att> #info This is the system that I tested IST.  Sharing of the code is there.
14:44:45 <farheen_att> #info Guy - we need persistence of user so that when notebooks crash you don't lose everything.
14:45:13 <farheen_att> #info Guy - Bryan uses persistent volumes and git these are good solutions.
14:45:57 <farheen_att> #info Guy resource concern.  We are pulling and running all the containers.
14:46:31 <farheen_att> #info Sharing is great when you look at RCloud to share code.  It would be nice to have a community of coders.
14:47:21 <farheen_att> #info Bryan - It's easy to add packages inside the notebook using the Python command.
14:47:30 <farheen_att> #info A shell window would be nice.
14:47:38 <farheen_att> #info Jupyter does the same thing.
14:48:07 <farheen_att> #info Bryan - If we want to build our own customzied jupyter stacked images we can do so.
14:50:01 <farheen_att> #info add this to the etherpad.
14:51:00 <farheen_att> #action Manoop - add this link to the etherpad.
14:51:36 <farheen_att> #link https://wiki.acumos.org/display/AR/Thoughts+on+ML+Workbench+from+a+Modeler%27s+Perspective
14:52:41 <farheen_att> #topic Bring your ideas to face to face meeting.
14:53:36 <farheen_att> #link https://etherpad.acumos.org/p/DemeterPlanningWorkshop
14:54:14 <farheen_att> #topic High level overview of Acumos-2901
14:54:52 <farheen_att> Parag - models have to have a name that is DNS compliant
14:55:54 <farheen_att> #info Parag when you are trying to deploy the model in k8 environment it will fail because it doesn't accept characters.
14:56:38 <farheen_att> #info it's the name of the container of the pod.  every pod is a domain name inside the cluster.  The name of the model can not have DNS compliant.
14:58:12 <farheen_att> #info we should not restrict the user from creating a name in DNS acceptable format but change the model name.
14:59:03 <farheen_att> #info when the user is onboarding we should check the model meta data such as name.
14:59:31 <farheen_att> #info rather than restricting the user we should automatically generate the name.
15:00:49 <farheen_att> #info we need a friendly name a DNS compliant name.  A short term solution is to restrict the user until it has been fixed.
15:04:11 <farheen_att> #action Guy lead the effort to accept a DNS generated name.
15:05:19 <farheen_att> #info Guy - there is code in there already to do that.
15:05:54 <farheen_att> #info Priya - have a friendly name and a system name.
15:06:03 <farheen_att> #info everyone agrees
15:07:08 <farheen_att> #action Parag convert Acumos-2901 from an Issue into a User Story in Demeter in jira.
15:08:42 <farheen_att> #topic ML Workbench status
15:12:06 <farheen_att> #info SV and deployment will not have a final docker by the end of this week.  Minimum the end of next week.
15:13:05 <farheen_att> #info there may be a gap in how to deploy.
15:13:18 <farheen_att> #info we will figure it out in the integration cycle.
15:13:38 <farheen_att> #info if development is ready then testing can start.  If not then issue should be raised.
15:14:07 <farheen_att> #endmeeting