#acumos-meeting: Architecture Committee

Meeting started by farheen_att at 14:05:55 UTC (full logs).

Meeting summary

  1. Agenda (farheen_att, 14:07:23)
    1. ML Workbench, DNS compliance issue, Kafka pivot (farheen_att, 14:12:30)

  2. microservice generation impact and the overall design (farheen_att, 14:13:05)
    1. Guy sharing his bullet points (farheen_att, 14:13:25)
    2. we also want to merge with Acumos with ATT internal org they use jenkins. (farheen_att, 14:15:22)
    3. https://wiki.acumos.org/pages/viewpage.action?pageId=26640755 (farheen_att, 14:16:01)
    4. This is planned for Demeter release not Clio. There will be code refactoring ms generation. (farheen_att, 14:17:48)
    5. which other components will be impacted? (farheen_att, 14:17:58)
    6. onboarding and portal have to be modified to call the API in jenkins but everything else should be the same. (farheen_att, 14:18:30)
    7. Tausif - This effected module needs to call jenkins API. What point in time does it need to be called? (farheen_att, 14:19:10)
    8. Guy - Right now in both onboarding and portal there are times when you call the ms generation api. Instead of calling the ms-generation api you call jenkins api. Minor modifications. You have to know where the jenkins api is and the ms-generation api will be turned off. (farheen_att, 14:20:17)
    9. we will continue to perform outbound calls will be handled by the runnable jars and look the same. (farheen_att, 14:20:53)
    10. solution id, revision id, and the user id will be the parameters to deploy or scan a model. (farheen_att, 14:21:23)
    11. Tausif - Will this be at the time of onboarding a model. (farheen_att, 14:21:35)
    12. Guy yes, if you create a ms. The portal will call the ms-generation api which will be changed to jenkins. (farheen_att, 14:22:12)
    13. Tausif generation of ms is coupled with onboarding. We are taking it away from onboarding and using jenkins? (farheen_att, 14:22:36)
    14. Guy - It's it's own module and can be invoked from on-boarding. You don't have to involve the portal at all. If you do use the portal to do web onboarding then you will have to call jenkins. (farheen_att, 14:23:19)
    15. Tausif how will we get the information back? (farheen_att, 14:23:41)
    16. Exactly the same as today. There is no impact on that part. (farheen_att, 14:23:57)
    17. Will we get a performance gain? (farheen_att, 14:24:13)
    18. No, the speed of ms generation is bounded by the docker host for fetching (farheen_att, 14:24:43)
    19. why are we doing this? (farheen_att, 14:24:50)
    20. docker and docker requires root level access and it doesn't work in Azure. (farheen_att, 14:25:20)
    21. docker in docker not docker and docker (farheen_att, 14:25:44)
    22. scaling is an additional benefit. (farheen_att, 14:26:44)
    23. Is there failure handling? Jenkins stops working after some time for no reason. We need to re-trigger the job. (farheen_att, 14:27:14)
    24. Guy - Logging for the build process is available. We have not addressed it but something to consider. We create the logs and see what goes wrong. (farheen_att, 14:28:12)
    25. proper error handling should be put forward with the proper flow. At the time of user stories we can add the right flows. (farheen_att, 14:28:40)
    26. Prya - docker image size can that be optimized by jenkins? (farheen_att, 14:30:20)
    27. we can brainstorm and see at the time of build. What is the target environment and can it be built in that environment? (farheen_att, 14:31:05)
    28. it's not the size of the kernal that's killing us. The model does have. (farheen_att, 14:33:20)
    29. Sayee - Concerned about docker size (farheen_att, 14:34:40)
    30. It's an orthogonal issue building a faster smaller docker image. (farheen_att, 14:35:00)
    31. irrespective of docker or not. Inside the ms generation should we optimize or not? We can work it in parallel. (farheen_att, 14:35:42)

  3. ML Workbech modeler user experience (farheen_att, 14:36:32)
    1. Guy - first thing is if we can't keep the acumos internal views in synch with jupyter notebooks then it won't add valuable to integrate with acumos. Changing a notebook will be lost. (farheen_att, 14:38:11)
    2. If I noticed these notebooks were not in synch I would ignore the models. So having Acumos keep track of these I would not find it valuable. (farheen_att, 14:39:04)
    3. I would ignore the models in Acumos. (farheen_att, 14:39:40)
    4. Bryan - Use git as a back end and synchronize. (farheen_att, 14:40:07)
    5. Sayee agrees that there should be tighter integration. (farheen_att, 14:40:56)
    6. There is no reason that I would used the Notebook. Modeler credentials being pre-loaded would be helpful. I want to just do a push. It knows who I am. (farheen_att, 14:41:45)
    7. I would also like to see more client libraries. CMLP has shell access which is good. You can't do that with the ML libs in jupyter. (farheen_att, 14:42:25)
    8. Bryan - URIs and credentials are easily available. (farheen_att, 14:42:46)
    9. Guy- this is an easy fix. (farheen_att, 14:42:59)
    10. Bryan - It is fixed. It is easy to do. (farheen_att, 14:43:13)
    11. Sayee- As we evolve the advantage is providing GPUs as the platform so the GPU can be attached on as needed basis. (farheen_att, 14:43:47)
    12. This is the system that I tested IST. Sharing of the code is there. (farheen_att, 14:44:24)
    13. Guy - we need persistence of user so that when notebooks crash you don't lose everything. (farheen_att, 14:44:45)
    14. Guy - Bryan uses persistent volumes and git these are good solutions. (farheen_att, 14:45:13)
    15. Guy resource concern. We are pulling and running all the containers. (farheen_att, 14:45:57)
    16. Sharing is great when you look at RCloud to share code. It would be nice to have a community of coders. (farheen_att, 14:46:31)
    17. Bryan - It's easy to add packages inside the notebook using the Python command. (farheen_att, 14:47:21)
    18. A shell window would be nice. (farheen_att, 14:47:30)
    19. Jupyter does the same thing. (farheen_att, 14:47:38)
    20. Bryan - If we want to build our own customzied jupyter stacked images we can do so. (farheen_att, 14:48:07)
    21. add this to the etherpad. (farheen_att, 14:50:01)
    22. ACTION: Manoop - add this link to the etherpad. (farheen_att, 14:51:00)
    23. https://wiki.acumos.org/display/AR/Thoughts+on+ML+Workbench+from+a+Modeler%27s+Perspective (farheen_att, 14:51:36)

  4. Bring your ideas to face to face meeting. (farheen_att, 14:52:41)
    1. https://etherpad.acumos.org/p/DemeterPlanningWorkshop (farheen_att, 14:53:36)

  5. High level overview of Acumos-2901 (farheen_att, 14:54:14)
    1. Parag when you are trying to deploy the model in k8 environment it will fail because it doesn't accept characters. (farheen_att, 14:55:54)
    2. it's the name of the container of the pod. every pod is a domain name inside the cluster. The name of the model can not have DNS compliant. (farheen_att, 14:56:38)
    3. we should not restrict the user from creating a name in DNS acceptable format but change the model name. (farheen_att, 14:58:12)
    4. when the user is onboarding we should check the model meta data such as name. (farheen_att, 14:59:03)
    5. rather than restricting the user we should automatically generate the name. (farheen_att, 14:59:31)
    6. we need a friendly name a DNS compliant name. A short term solution is to restrict the user until it has been fixed. (farheen_att, 15:00:49)
    7. ACTION: Guy lead the effort to accept a DNS generated name. (farheen_att, 15:04:11)
    8. Guy - there is code in there already to do that. (farheen_att, 15:05:19)
    9. Priya - have a friendly name and a system name. (farheen_att, 15:05:54)
    10. everyone agrees (farheen_att, 15:06:03)
    11. ACTION: Parag convert Acumos-2901 from an Issue into a User Story in Demeter in jira. (farheen_att, 15:07:08)

  6. ML Workbench status (farheen_att, 15:08:42)
    1. SV and deployment will not have a final docker by the end of this week. Minimum the end of next week. (farheen_att, 15:12:06)
    2. there may be a gap in how to deploy. (farheen_att, 15:13:05)
    3. we will figure it out in the integration cycle. (farheen_att, 15:13:18)
    4. if development is ready then testing can start. If not then issue should be raised. (farheen_att, 15:13:38)


Meeting ended at 15:14:07 UTC (full logs).

Action items

  1. Manoop - add this link to the etherpad.
  2. Guy lead the effort to accept a DNS generated name.
  3. Parag convert Acumos-2901 from an Issue into a User Story in Demeter in jira.


People present (lines said)

  1. farheen_att (81)
  2. collabot` (3)


Generated by MeetBot 0.1.4.