#acumos-meeting log

14:03:19 <farheen_att> #startmeeting Architecture Committee
14:03:19 <collabot`> Meeting started Wed Aug 28 14:03:19 2019 UTC.  The chair is farheen_att. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:03:19 <collabot`> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:03:19 <collabot`> The meeting name has been set to 'architecture_committee'
14:07:35 <farheen_att> #topic ML Workbench - Sayee and team
14:14:01 <farheen_att> #info Sayee reviewing ML workbench.  Using Angular.  Advantage is easy to build a UI with plugin approach.  Each module can run in a pod.  Problem: today we deploy to single VM. Enterprises have their own set of pipelines.  Solution is to make an admin who can configure their acumos instance called a config manager.
14:15:50 <farheen_att> #info storing meta data for model mapping in couchdb.  Similar content will be mapped in config management.
14:17:10 <farheen_att> #info where do we have this manager.  Is it an addendum to the existing Admin or stand alone config manager.
14:21:19 <farheen_att> #info needs to be discussed.  support and guidelines for k8 is driven by what we co host on the platform.  It takes a long time to launch a pipeline or notebook.  Advantage with noSQL is we are integrating with couchdb.
14:23:05 <farheen_att> #info where will the config manager be?  in the admin ui or as a tile in design studio.
14:23:58 <farheen_att> #model association - We can configure how the models will talk to two instances through E5.
14:25:31 <farheen_att> #info model association - config manager will be able to configure how the model is shared across two instances of Acumos through E5.
14:29:16 <farheen_att> #info linking is done through Nifi.  The association is not stored in Nifi.  We have to train the data scientist to use.  Learning curve.
14:29:53 <farheen_att> #info data set is a data set of meta data.  Such as attributes.
14:30:02 <farheen_att> #info where is the history?
14:30:35 <farheen_att> #info where is the record of the data association to the model
14:30:49 <farheen_att> #info Nifi gives you data as well as provenance.
14:31:05 <farheen_att> #info Does Nifi give you provenance out of the box?
14:31:57 <farheen_att> #action Sayee will talk thru the gaps of the details around data sets and provenance.
14:32:59 <farheen_att> #info Priya - you have a data set id and it's physical location.  As a part of onboarding I can specify the model and sample data set then onboard it as a package.
14:33:51 <farheen_att> #info Priya proposes to add the sample data set during the time of on-boarding.
14:35:06 <farheen_att> #info This represents the ground zero of provenance.  Modelers are making a claim of what they onboarding.  There is not a verify system.  Keep it in mind.
14:36:08 <farheen_att> #info Sayee likes Priya's suggestion of on-boarding data set during the time of on-boarding.
14:37:30 <farheen_att> #info We will continue to have a UI for create modify delete.  When we onboard a mode we can associate the data source through the initial UI. Optional during time on onboarding and mandatory during the time of publishing.
14:38:19 <farheen_att> #info Priya- Profile of 10,00 0 recoreds.
14:39:11 <farheen_att> #info Sayee - You can write an SQL query.
14:39:54 <farheen_att> #info Tausif - Create/modify/delete .  How will we migrate the old data to new?
14:40:45 <farheen_att> #info Sayee - a dataset is always associated with the model.
14:41:37 <farheen_att> #info it is a many to many relationship.  Once an association is made can you change it later?
14:41:47 <farheen_att> #info yes, they have to be able to go and edit.
14:42:04 <farheen_att> #info What if i change datasets related to one model.
14:43:06 <farheen_att> #info I onboarded a model with dataset1.  I want to change that association on day 2 because I made a mistake?  Then I should have the ability to change.
14:43:33 <farheen_att> #info second case I have the same model with new dataset then I have to onboard a new model.
14:46:34 <farheen_att> #info essentially its a definition of a dataset.  Go to ui where you define your dataset.  You have a name, desc, define data source.  URI to where you can the data.  Completion of this task is a name 1.  What if you change URI?
14:46:45 <farheen_att> #info call it name 2
14:47:06 <farheen_att> #info what if you change the source of the model?
14:47:32 <farheen_att> #info then you can not change the URI.  You have to create a new model.
14:48:31 <farheen_att> #info in summary of this topic.  we are working on associations and config manager.
14:49:33 <farheen_att> #info bryan can provide information of the cluster and successful deployment.  Anything beyond you need to specify.
14:50:16 <farheen_att> #info managing lifecycle is something that needs to be improved.
14:50:58 <farheen_att> #info more issues.  Do we need to dig deeper on couchdb?
14:51:38 <farheen_att> #info issue with K8?  is it still an issue?
14:52:07 <farheen_att> #info We have a helm chart that installs couchdb and a part of system integration just like mariadb.
14:52:20 <farheen_att> #info Polymer?
14:53:53 <farheen_att> #info Problem it takes a long time to load components due to dependencies.  not scalable.
14:55:11 <farheen_att> #info We need to bundle the dependencies into one file.  Initially a little slow but the response time is much faster.
14:57:22 <farheen_att> #info it's the number of files not the size of the files.  We are benefitting from the web component.  Final product can be a web component to drop in view components.
14:58:22 <farheen_att> #info it's a reliability concern. The files received can stack a number of threats. Packaging as a single file if more efficient. license compliant.
14:58:54 <farheen_att> #info back end is not effected.
14:59:05 <farheen_att> #info issue seen from ui team?
14:59:38 <farheen_att> #info Yes, issue with our existing CSS
15:00:09 <farheen_att> #info we will try to match the color scheme and set up a call with you all.
15:01:06 <farheen_att> #action Sayee set up a call with Tausif, Farheen, and Vasu.
15:01:50 <farheen_att> #info any other performance improvements?
15:02:24 <farheen_att> #info lazy loads the pipeline and ml worbench
15:02:53 <farheen_att> #info are minimizing tools used for polymer single file.
15:03:29 <farheen_att> #info the tools are not integrated into the polymer process. Our focus is on the feature.
15:03:35 <farheen_att> #info any cacheing strategy?
15:04:11 <farheen_att> #info we get cache automatically on load.  Server side strategy with gzip when we serve the server type strategy.
15:05:53 <farheen_att> #info brotli is better compression tool
15:06:26 <farheen_att> #info brotli is a part of server side compression.  you can see it in your request header view.
15:07:06 <farheen_att> #info cacheing can be optimized for the file name.
15:08:14 <farheen_att> #action Manoop add the topics for ML workbench for the next call.
15:08:18 <farheen_att> #endmeeting