14:03:19 #startmeeting Architecture Committee 14:03:19 Meeting started Wed Aug 28 14:03:19 2019 UTC. The chair is farheen_att. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:19 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:03:19 The meeting name has been set to 'architecture_committee' 14:07:35 #topic ML Workbench - Sayee and team 14:14:01 #info Sayee reviewing ML workbench. Using Angular. Advantage is easy to build a UI with plugin approach. Each module can run in a pod. Problem: today we deploy to single VM. Enterprises have their own set of pipelines. Solution is to make an admin who can configure their acumos instance called a config manager. 14:15:50 #info storing meta data for model mapping in couchdb. Similar content will be mapped in config management. 14:17:10 #info where do we have this manager. Is it an addendum to the existing Admin or stand alone config manager. 14:21:19 #info needs to be discussed. support and guidelines for k8 is driven by what we co host on the platform. It takes a long time to launch a pipeline or notebook. Advantage with noSQL is we are integrating with couchdb. 14:23:05 #info where will the config manager be? in the admin ui or as a tile in design studio. 14:23:58 #model association - We can configure how the models will talk to two instances through E5. 14:25:31 #info model association - config manager will be able to configure how the model is shared across two instances of Acumos through E5. 14:29:16 #info linking is done through Nifi. The association is not stored in Nifi. We have to train the data scientist to use. Learning curve. 14:29:53 #info data set is a data set of meta data. Such as attributes. 14:30:02 #info where is the history? 14:30:35 #info where is the record of the data association to the model 14:30:49 #info Nifi gives you data as well as provenance. 14:31:05 #info Does Nifi give you provenance out of the box? 14:31:57 #action Sayee will talk thru the gaps of the details around data sets and provenance. 14:32:59 #info Priya - you have a data set id and it's physical location. As a part of onboarding I can specify the model and sample data set then onboard it as a package. 14:33:51 #info Priya proposes to add the sample data set during the time of on-boarding. 14:35:06 #info This represents the ground zero of provenance. Modelers are making a claim of what they onboarding. There is not a verify system. Keep it in mind. 14:36:08 #info Sayee likes Priya's suggestion of on-boarding data set during the time of on-boarding. 14:37:30 #info We will continue to have a UI for create modify delete. When we onboard a mode we can associate the data source through the initial UI. Optional during time on onboarding and mandatory during the time of publishing. 14:38:19 #info Priya- Profile of 10,00 0 recoreds. 14:39:11 #info Sayee - You can write an SQL query. 14:39:54 #info Tausif - Create/modify/delete . How will we migrate the old data to new? 14:40:45 #info Sayee - a dataset is always associated with the model. 14:41:37 #info it is a many to many relationship. Once an association is made can you change it later? 14:41:47 #info yes, they have to be able to go and edit. 14:42:04 #info What if i change datasets related to one model. 14:43:06 #info I onboarded a model with dataset1. I want to change that association on day 2 because I made a mistake? Then I should have the ability to change. 14:43:33 #info second case I have the same model with new dataset then I have to onboard a new model. 14:46:34 #info essentially its a definition of a dataset. Go to ui where you define your dataset. You have a name, desc, define data source. URI to where you can the data. Completion of this task is a name 1. What if you change URI? 14:46:45 #info call it name 2 14:47:06 #info what if you change the source of the model? 14:47:32 #info then you can not change the URI. You have to create a new model. 14:48:31 #info in summary of this topic. we are working on associations and config manager. 14:49:33 #info bryan can provide information of the cluster and successful deployment. Anything beyond you need to specify. 14:50:16 #info managing lifecycle is something that needs to be improved. 14:50:58 #info more issues. Do we need to dig deeper on couchdb? 14:51:38 #info issue with K8? is it still an issue? 14:52:07 #info We have a helm chart that installs couchdb and a part of system integration just like mariadb. 14:52:20 #info Polymer? 14:53:53 #info Problem it takes a long time to load components due to dependencies. not scalable. 14:55:11 #info We need to bundle the dependencies into one file. Initially a little slow but the response time is much faster. 14:57:22 #info it's the number of files not the size of the files. We are benefitting from the web component. Final product can be a web component to drop in view components. 14:58:22 #info it's a reliability concern. The files received can stack a number of threats. Packaging as a single file if more efficient. license compliant. 14:58:54 #info back end is not effected. 14:59:05 #info issue seen from ui team? 14:59:38 #info Yes, issue with our existing CSS 15:00:09 #info we will try to match the color scheme and set up a call with you all. 15:01:06 #action Sayee set up a call with Tausif, Farheen, and Vasu. 15:01:50 #info any other performance improvements? 15:02:24 #info lazy loads the pipeline and ml worbench 15:02:53 #info are minimizing tools used for polymer single file. 15:03:29 #info the tools are not integrated into the polymer process. Our focus is on the feature. 15:03:35 #info any cacheing strategy? 15:04:11 #info we get cache automatically on load. Server side strategy with gzip when we serve the server type strategy. 15:05:53 #info brotli is better compression tool 15:06:26 #info brotli is a part of server side compression. you can see it in your request header view. 15:07:06 #info cacheing can be optimized for the file name. 15:08:14 #action Manoop add the topics for ML workbench for the next call. 15:08:18 #endmeeting