#acumos-meeting log

14:15:35 <farheen> #startmeeting Architecture Committee
14:15:35 <collabot`> Meeting started Thu Oct 25 14:15:35 2018 UTC.  The chair is farheen. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:15:35 <collabot`> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:15:35 <collabot`> The meeting name has been set to 'architecture_committee'
14:15:51 <farheen> #topic E2E Project Requirements
14:16:33 <farheen> #info Anwar: We have to have an understanding of Kubeflow will insure that we have the maximum re-usability.
14:17:17 <farheen> #info Manoop:  Data pipeline does have a high level pipeline from there the developer would look at the already existing components that we need to integrate.
14:17:53 <farheen> #info First start of tech document is finding re-usable existing components.
14:18:30 <farheen> #info Pantellis: I need help from each of the projects impacted CMLP is one that you mentioned.  That would be helpful to me.
14:19:04 <farheen> #info Anwar: Yes, we'll have a one day lockdown.
14:19:58 <farheen> #info Adi: I don't understand all the pipelines.  It is a different framework.  The pipeline effort is to take any flows, topics, publishers, to create a scoring pipeline and any type of pipeline that they want.
14:20:13 <farheen> #info Pantellis: It is symantecs.
14:20:46 <farheen> #info Adi: If that is the case then we can break it up into 4 but to me the underlying framework is what is allowing me to create these different types of pipelines.
14:21:16 <farheen> #info Pantellis: OK, so you're saying the CMLP project already has a pipeline implemented?
14:22:07 <farheen> #info Adi: We have kafka, flink, it is not ready.  My goal was to have the pipelines completely API driven.
14:22:34 <farheen> #info Adi: Even airflow provides a capability. Now they give you an api.
14:23:02 <farheen> #info Pantellis: Compared to ARGO do you know how it is compared to Kubernetes?
14:23:12 <farheen> #info Adi: I am not an expert on airflow.
14:23:42 <farheen> #info Pantellis: Perhaps Adi and I can get together.
14:23:49 <farheen> #info Adi: Yes,
14:24:23 <farheen> #action Pantellis and Adi get together to further discuss.
14:24:44 <farheen> #info Anwar: Pantellis set up a lockdown with onboarding, Design Studio, CMLP.
14:25:05 <farheen> #info Pantellis: We will kick it off to a workshop setting.
14:25:15 <farheen> #action: Pantellis set up a lockdown.
14:25:54 <farheen> #info Jessica: Please anticipate that our part of Federated learning is to be in.  Be sure that the E2E pipeline can accomodate it.
14:26:08 <farheen> #info Jessica: We are coming late and trying to catch up.
14:26:31 <farheen> #info Pantellis: Yes, I will include you in the federated learning.
14:26:57 <farheen> #info Can you please include us?
14:27:07 <farheen> #info Pantellis: What's a lockdown?
14:27:26 <farheen> #info Anwar: It's an uninterrupted meeting that lasts a whole or half day.
14:28:17 <farheen> #info Manoop: Before these release starts we plan for a 3 day workshop and ask each PTL to propose what will be in the technical architecture.
14:31:46 <farheen> #action Farheen Bring it up on the TSC call on Monday about the epics and exactly what will be the benefits.
14:33:09 <farheen> #topic AI/ML Target State Solution View
14:33:57 <farheen> #info Adi: Before getting into ML workbench.
14:34:18 <farheen> #info When you deal with ML you have to go through std patterns.
14:34:47 <farheen> #info You have system of records and cook data sets.
14:35:05 <farheen> #info They are going to give you data in raw or cooked form and put in some sort of catalog.
14:35:17 <farheen> #info my main point is it's a lot of work.
14:35:35 <farheen> #info If you're doing ML in this ecosystem managing these 12 tasks is a real problem.
14:36:29 <farheen> #info Decisions are being taking automatically and models are entering and leaving the ecosystem.  It an organizational idea.
14:36:37 <farheen> #info you get this out of the box.
14:36:54 <farheen> #info when i look at cmlp or acumos i don't see the organizational construct.
14:37:04 <farheen> #info It changes the way you collaborate.
14:37:50 <farheen> #info Anwar: What we saw in kubeflow we saw management of training the data sets,
14:38:02 <farheen> #info Adi: Kubeflow is one way to do a lot of these things.
14:38:28 <farheen> #info Data is one of those things you have to be careful about.  How do you manage and organize access to it.
14:39:00 <farheen> #info Pantellis: Giving kubeflow is an extension on top of kubernetes we can manage by kubernetes.
14:39:48 <farheen> #info Adi: I'm looking at it from a distributed platform.  This management is not organized.
14:40:28 <farheen> #info Good questions I'm setting the context for the workbench because it will constantly evolve.  They need to be managed in a workbench.
14:40:43 <farheen> #info Kazi: Can we go thru 1 - 12.
14:41:18 <farheen> #info Adi: Multiple data lakes dbases, flat files, any large ML system has to deal with many types of data.
14:41:51 <farheen> #info Adi: 3.  Data libraries.  Warehouses where you cook the data create schemas and then 4. Catatlog
14:42:29 <farheen> #info 5. I'm going to give it to somebody through a pipeline.  6. ML picks up data 7. Training then puts it in the catalog.
14:42:52 <farheen> #info 8. you deploy into Kubernetes and where do you get your data from?
14:44:22 <farheen> #info 9. Run time. 11. When you build and run your model it is consuming your application is number 11.
14:44:46 <farheen> #info Adi: This is just a context.
14:45:41 <farheen> #info Adi: If you go to Google AWS IBM clouds the first thing you do is create a workbench.  An organizing construct.
14:46:50 <farheen> #info Bryan: I think that Center data pipeline your input would be helpful.  Where-ever you see CMLP is where I see CMLP and Acumos.
14:47:14 <farheen> #action Adi: Update slide remove CMLP and replace with AI Acumos.
14:48:12 <farheen> #info Adi demonstrating CMLP workbench.  1. you create a project.  You can give them an option for storage requirements.
14:49:38 <farheen> #info You can have assets and do your ML in Notebooks, etc.. Each entry is going to have an entry over here.  A project is made of a set of assets.  If I start competing with models they have to be associated with projects that have to work in run time.
14:49:58 <farheen> #info It's an organizational construct for assets.
14:50:17 <farheen> #info you can create a notebook and go and start adding assets to that.
14:50:34 <farheen> #info Do you give equal rights for data constructs?
14:51:00 <farheen> #info Yes, you can use flink kafka you can start your ETL flows.
14:51:14 <farheen> #info Anand: So creator can give access rights?
14:51:18 <farheen> #info yes,
14:53:05 <farheen> #info Adi: High Level view screenshot.  Cooked data sets and data lakes.  Workbench is a massive set of assets.  If you don't provide a workbench you have to provide management.  How do you track the model?  E2E view of what an ML workbench.
14:56:31 <farheen> #info Anwar: Want to go through PTLs and their plans around architecture.
14:57:12 <farheen> #info Ken: We are finished with testing and ask to have the release built for all the components.  The release B the test team is asking when the User stories will be ready.
14:57:26 <farheen> #info Anwar: no architecture impacts.
14:57:41 <farheen> #info Mukesh: We are creating the epics.
14:58:01 <farheen> #info Mukesh: Focus is on Boreas.
14:58:41 <farheen> #info Percentage complete?  In terms of scoping the stories we are around 35 - 40%.
14:59:49 <farheen> #info Phillippe: Regarding Boreas we studied ONNX format to onboard models.  And then we will certainly have some impact coming from the training project.  I have to coordinate with Pantellis to discuss the impact of training project.
15:00:34 <farheen> #info: Chris: So far the requirements I've reviewed do not require architecture changes.  As far as Athena we are fixing a bug for the maintenance release.
15:01:33 <farheen> #info: We have put the code on the branch and documentation is ready.  For Boreas there are a lot of impact with the pipelines and putting the jupyter into architecture.  We are between 10 - 20%.
15:02:49 <farheen> #info Deployment: Still preparing to do the jira items.  For Boreas we have to depricate the existing.  Documentation is left for the Athena.  For Boreas i wouldn't say any out-lining has to take place.
15:03:45 <farheen> #action Anwar reach out to PTLs to get the remainder of the components.
15:04:17 <farheen> #info Michelle: From licensing we are getting good participation from Orange.  Etc.  We are meeting twice a week.
15:04:29 <farheen> #endmeeting