14:15:35 #startmeeting Architecture Committee 14:15:35 Meeting started Thu Oct 25 14:15:35 2018 UTC. The chair is farheen. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:15:35 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:15:35 The meeting name has been set to 'architecture_committee' 14:15:51 #topic E2E Project Requirements 14:16:33 #info Anwar: We have to have an understanding of Kubeflow will insure that we have the maximum re-usability. 14:17:17 #info Manoop: Data pipeline does have a high level pipeline from there the developer would look at the already existing components that we need to integrate. 14:17:53 #info First start of tech document is finding re-usable existing components. 14:18:30 #info Pantellis: I need help from each of the projects impacted CMLP is one that you mentioned. That would be helpful to me. 14:19:04 #info Anwar: Yes, we'll have a one day lockdown. 14:19:58 #info Adi: I don't understand all the pipelines. It is a different framework. The pipeline effort is to take any flows, topics, publishers, to create a scoring pipeline and any type of pipeline that they want. 14:20:13 #info Pantellis: It is symantecs. 14:20:46 #info Adi: If that is the case then we can break it up into 4 but to me the underlying framework is what is allowing me to create these different types of pipelines. 14:21:16 #info Pantellis: OK, so you're saying the CMLP project already has a pipeline implemented? 14:22:07 #info Adi: We have kafka, flink, it is not ready. My goal was to have the pipelines completely API driven. 14:22:34 #info Adi: Even airflow provides a capability. Now they give you an api. 14:23:02 #info Pantellis: Compared to ARGO do you know how it is compared to Kubernetes? 14:23:12 #info Adi: I am not an expert on airflow. 14:23:42 #info Pantellis: Perhaps Adi and I can get together. 14:23:49 #info Adi: Yes, 14:24:23 #action Pantellis and Adi get together to further discuss. 14:24:44 #info Anwar: Pantellis set up a lockdown with onboarding, Design Studio, CMLP. 14:25:05 #info Pantellis: We will kick it off to a workshop setting. 14:25:15 #action: Pantellis set up a lockdown. 14:25:54 #info Jessica: Please anticipate that our part of Federated learning is to be in. Be sure that the E2E pipeline can accomodate it. 14:26:08 #info Jessica: We are coming late and trying to catch up. 14:26:31 #info Pantellis: Yes, I will include you in the federated learning. 14:26:57 #info Can you please include us? 14:27:07 #info Pantellis: What's a lockdown? 14:27:26 #info Anwar: It's an uninterrupted meeting that lasts a whole or half day. 14:28:17 #info Manoop: Before these release starts we plan for a 3 day workshop and ask each PTL to propose what will be in the technical architecture. 14:31:46 #action Farheen Bring it up on the TSC call on Monday about the epics and exactly what will be the benefits. 14:33:09 #topic AI/ML Target State Solution View 14:33:57 #info Adi: Before getting into ML workbench. 14:34:18 #info When you deal with ML you have to go through std patterns. 14:34:47 #info You have system of records and cook data sets. 14:35:05 #info They are going to give you data in raw or cooked form and put in some sort of catalog. 14:35:17 #info my main point is it's a lot of work. 14:35:35 #info If you're doing ML in this ecosystem managing these 12 tasks is a real problem. 14:36:29 #info Decisions are being taking automatically and models are entering and leaving the ecosystem. It an organizational idea. 14:36:37 #info you get this out of the box. 14:36:54 #info when i look at cmlp or acumos i don't see the organizational construct. 14:37:04 #info It changes the way you collaborate. 14:37:50 #info Anwar: What we saw in kubeflow we saw management of training the data sets, 14:38:02 #info Adi: Kubeflow is one way to do a lot of these things. 14:38:28 #info Data is one of those things you have to be careful about. How do you manage and organize access to it. 14:39:00 #info Pantellis: Giving kubeflow is an extension on top of kubernetes we can manage by kubernetes. 14:39:48 #info Adi: I'm looking at it from a distributed platform. This management is not organized. 14:40:28 #info Good questions I'm setting the context for the workbench because it will constantly evolve. They need to be managed in a workbench. 14:40:43 #info Kazi: Can we go thru 1 - 12. 14:41:18 #info Adi: Multiple data lakes dbases, flat files, any large ML system has to deal with many types of data. 14:41:51 #info Adi: 3. Data libraries. Warehouses where you cook the data create schemas and then 4. Catatlog 14:42:29 #info 5. I'm going to give it to somebody through a pipeline. 6. ML picks up data 7. Training then puts it in the catalog. 14:42:52 #info 8. you deploy into Kubernetes and where do you get your data from? 14:44:22 #info 9. Run time. 11. When you build and run your model it is consuming your application is number 11. 14:44:46 #info Adi: This is just a context. 14:45:41 #info Adi: If you go to Google AWS IBM clouds the first thing you do is create a workbench. An organizing construct. 14:46:50 #info Bryan: I think that Center data pipeline your input would be helpful. Where-ever you see CMLP is where I see CMLP and Acumos. 14:47:14 #action Adi: Update slide remove CMLP and replace with AI Acumos. 14:48:12 #info Adi demonstrating CMLP workbench. 1. you create a project. You can give them an option for storage requirements. 14:49:38 #info You can have assets and do your ML in Notebooks, etc.. Each entry is going to have an entry over here. A project is made of a set of assets. If I start competing with models they have to be associated with projects that have to work in run time. 14:49:58 #info It's an organizational construct for assets. 14:50:17 #info you can create a notebook and go and start adding assets to that. 14:50:34 #info Do you give equal rights for data constructs? 14:51:00 #info Yes, you can use flink kafka you can start your ETL flows. 14:51:14 #info Anand: So creator can give access rights? 14:51:18 #info yes, 14:53:05 #info Adi: High Level view screenshot. Cooked data sets and data lakes. Workbench is a massive set of assets. If you don't provide a workbench you have to provide management. How do you track the model? E2E view of what an ML workbench. 14:56:31 #info Anwar: Want to go through PTLs and their plans around architecture. 14:57:12 #info Ken: We are finished with testing and ask to have the release built for all the components. The release B the test team is asking when the User stories will be ready. 14:57:26 #info Anwar: no architecture impacts. 14:57:41 #info Mukesh: We are creating the epics. 14:58:01 #info Mukesh: Focus is on Boreas. 14:58:41 #info Percentage complete? In terms of scoping the stories we are around 35 - 40%. 14:59:49 #info Phillippe: Regarding Boreas we studied ONNX format to onboard models. And then we will certainly have some impact coming from the training project. I have to coordinate with Pantellis to discuss the impact of training project. 15:00:34 #info: Chris: So far the requirements I've reviewed do not require architecture changes. As far as Athena we are fixing a bug for the maintenance release. 15:01:33 #info: We have put the code on the branch and documentation is ready. For Boreas there are a lot of impact with the pipelines and putting the jupyter into architecture. We are between 10 - 20%. 15:02:49 #info Deployment: Still preparing to do the jira items. For Boreas we have to depricate the existing. Documentation is left for the Athena. For Boreas i wouldn't say any out-lining has to take place. 15:03:45 #action Anwar reach out to PTLs to get the remainder of the components. 15:04:17 #info Michelle: From licensing we are getting good participation from Orange. Etc. We are meeting twice a week. 15:04:29 #endmeeting