15:02:34 #startmeeting Architecture Committee 15:02:34 Meeting started Wed Jan 29 15:02:34 2020 UTC. The chair is farheen. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:34 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:02:34 The meeting name has been set to 'architecture_committee' 15:03:44 #agenda Sayee: ML Workbench features for DemeterDataset management and othershttps://bestpractices.coreinfrastructure.org/en/projects/3654 (Murali/Manoop)CII Best PracticesArch scorecard and Demeter Epics/US updates (PTL's)Next week(s):SOAJS presentation - Antoine - ? 15:09:43 # topic Best practices 15:09:49 #link https://bestpractices.coreinfrastructure.org/en/projects/3654 15:10:03 #topic Sprint 1 status 15:12:07 #info Vaibhav wants to add RTU regarding ML workbench and licensing. 15:12:36 #info Vaibhav have discussed integration with licensing team. However when we integrate it will delay our response. 15:13:02 #info this is for model aggregation (composite models) 15:13:30 #info Sayee - We don't want to do recursive library version. I hope the license will be for the model and not for each version. 15:13:47 #info Michelle - Have you talked to Alex recently? 15:15:55 #info Sayee - we need a more effective way of accessing. We will need portal, ds, and ml workbench 15:16:11 #info add a topic for Jenkins installation 15:16:31 #info Sayee - federated models, how do you deploy? How do you scale the model? 15:17:15 #info Manoop - Who will be leading the deployment effort? Internally we had a brief discussion about acquiring resources. 15:18:10 #action All open community - we need resources to enhance the Acumos model deployment. 15:19:04 #info Nat - Santosh is not able to but Justin was able to deploy a model using Bryan's scripts. We asked Justin to have a working session and guide the team. 15:19:48 #info Nat - I asked Michelle to check with Justin's availability. He will be available after Feb first week. He's available Monday and Tuesday morning next week. 15:23:45 #topic Dataset management Sayee 15:24:47 #info Current plan ML workbench have an dataset plugin. Manage it by a data source integration meta data. Hadoop connectors and a string query that will be used to train the model. This will be used to tightly zip model training. Model validation can also be done. 15:25:32 #info you will be able to train a model using a data set. To train the model you need data. This dataset will tell you how you did the training on models. 15:26:00 #info we have not worked with the training team. 15:26:35 #info have you talked through the data pipeline? 15:27:25 #info you would have to message, filtering, anonnymizing data. 15:27:48 #info initially we want to use nifi to train the data. 15:28:12 #info Manoop - we will use nifi pipelines unless there is an alternate solution. Wen Ting? 15:29:38 #info Wenting - we want federated learning. If we're talking ete training then you need to set up the emulator environment in terms of compute resources and kernal iibraries when you deploy the model. Training resources how do you activate or trigger the training process. Not much we can work on. My understanding is to support ETE. Emulate 15:29:38 that environment is most important. 15:29:57 #info i can't share with the open source community right now. 15:30:23 #info enabling bi-directional communication between the model and Acumos? 15:31:30 #info bi-directional between two instances. It could be hyper parameters, weights, to be passed. You can design a model to accept new apis to accept new weights. Or create a new api to return model platform. How that can be used can be very important uses in training. 15:32:16 #info Manoop - Interesting thoughts. Such bi-directional communication will help in ways that will directly impact licensing services that will be applied to models. We need to discuss how we can use as a platform feature. 15:33:17 #info 1. General platform feature that can take advantage of the APIs was our originial vision. 15:34:04 #info Wenting for sprint 1 we will be able to handle the outbound from supplier to supplier then in the sprint where to host this at the platform level. 15:34:36 #info Manoop - When you have architectural APIs for open community then bring them to this call. 15:34:57 #info Wenting - we have outbound that we can share but inbound is still a work in progress. 15:35:26 #info Sayee - Inbound are you thinking the training will be done or you want to do it during run time? 15:35:46 #info Wenting - it makes sense for bi-directional federation. 15:36:13 #info Wenting - I can't share model specific details. 15:36:59 #info Manoop - Let us share the outbound APIs. 15:37:37 #info based on that we can discuss what is the impact. 15:38:15 #info Wenting - we have considered the versioning towards license manager. We have to do this in a progressive manner because I can't share the design. 15:38:49 #info Sayee- If you are going to have a data set in the platform can those be consumed by training or are we developing something outside. 15:39:09 #info Wenting data may not be realistic to connect to a data platform. 15:39:41 #info Sayee - we are not hosting the data we are hosting the meta data and provide a link to the data. 15:40:20 #info Wenting - you have to pipe it in a data lake. 15:40:47 #info Sayee - agree the connection string is what allows you to train. 15:41:36 #action Manoop - Let's plan a focused discussion on data management and training with Wenting and Sayee. 15:43:42 #topic Glowroot Hosting and Performance Metrics 15:44:01 #info Vaibhav - it is in the demeter release details page. 15:44:58 #info this fits in the OAM project because this is not providing a brand new feature. It is a tool to monitor our platforms performance. That is where the documentation should go. If it requires documentation for docs.acumos.org we can move it there also. 15:47:04 #info Manoop this is an optional component if they choose to have this however it is not mandatory. I would move this wiki to OAM documentation. 15:48:10 #info Vaibhav the steps we just saw is for each component. If you want to make it a centralized component then you can do a centralized deployment. Should we install a Cassandra and try it out? 15:49:16 #info Manoop - for development purpose we can make decisions on our own. For example AT&T can try it and provide feedback. However it is up to testing and development teams to decide. 15:50:19 #info Sayee - You can inform the users and give them some sample data. It is up to testing and development teams are up to the admins whether or not they choose to install. 15:51:04 #info if you claim that ML Workbench requires these monitoring service then Acumos platform depends on the service. Is there any scneario? 15:51:34 #info No, specifically for java server and services it will intervene the ms and not scale the ms. Model performance is another case. 15:52:02 #info Manoop - then keep this topic internal to third party service. 15:54:07 #info Ken - Test team doesn't want added issues for optional functions. 15:54:49 #info - If it is not a part of acumos platform then we should have an option to turn it off. 15:55:54 #topic Sayee and Vaibhav's concerns about RTU impacts. 15:56:14 #info Sayee - We have rest APIs we can call for the model and model version correct? 15:56:57 #info No today the LUM only wants to know the software id tag. It is not interested in the content of the model then no. Solution ID, revision ID. LUM will know if it's open source and free to use. 15:57:18 #info Michelle this is for the composite model. He's thinking of calling it with the revision ID vs. solution ID. 15:57:46 #info Sayee instead of calling by revision do you think we can do a solution level and not a revision level for a faster response. 15:58:31 #info Alex- we have two calls to LUM. DS will have to call new API and LUM will return all the software tags that are currently allowed for the composition. So you will have a list. 15:58:43 #info specifics of this call we are giving you a list. 15:58:58 #info Sayee - that is the getList API the solution id and revision ID will come. we agree. 15:59:18 #info Sayee - When the user is changing for aggregation we have to check by asset and version. 15:59:58 #info the second call is important for the Acumos revision id. That specific version of the model is a counter. We do it by counter. 16:02:39 #info Alex - Main problem is incrementing the usage count. It looks like a special case action. If we're not going to pass the increment usage count then... 16:03:44 #info Sayee - when you want to increment on deploying the model may be one way 16:04:57 #info LUM will ignore the usage count. then we will not have to ever worry about incrementing the counts. If you decide to include the software then it means two apis. One all the possible software and the second with a list of revisions that are avialable but we will not call asset usage logic. 16:05:26 #action Michelle document call aggregate, LUM will ignore the usage count restriction. 16:05:40 #info Michelle because it's more like a query then actual use. 16:06:43 #info we are not considering the model in the catalogs. It is not an asset and should not be counted towards usage. then we can stay with the first api or have an api plus sub revision ids. 16:07:42 #action Michelle - have a focused discussion on Vailbhav's call. 16:07:47 #endmeeting