#acumos-meeting: Architecture Committee
Meeting started by farheen at 15:02:34 UTC
(full logs).
Meeting summary
-
- https://bestpractices.coreinfrastructure.org/en/projects/3654
(farheen,
15:09:49)
- Sprint 1 status (farheen, 15:10:03)
- Vaibhav wants to add RTU regarding ML workbench
and licensing. (farheen,
15:12:07)
- Vaibhav have discussed integration with
licensing team. However when we integrate it will delay our
response. (farheen,
15:12:36)
- this is for model aggregation (composite
models) (farheen,
15:13:02)
- Sayee - We don't want to do recursive library
version. I hope the license will be for the model and not for each
version. (farheen,
15:13:30)
- Michelle - Have you talked to Alex
recently? (farheen,
15:13:47)
- Sayee - we need a more effective way of
accessing. We will need portal, ds, and ml workbench (farheen,
15:15:55)
- add a topic for Jenkins installation
(farheen,
15:16:11)
- Sayee - federated models, how do you deploy?
How do you scale the model? (farheen,
15:16:31)
- Manoop - Who will be leading the deployment
effort? Internally we had a brief discussion about acquiring
resources. (farheen,
15:17:15)
- ACTION: All open
community - we need resources to enhance the Acumos model
deployment. (farheen,
15:18:10)
- Nat - Santosh is not able to but Justin was
able to deploy a model using Bryan's scripts. We asked Justin to
have a working session and guide the team. (farheen,
15:19:04)
- Nat - I asked Michelle to check with Justin's
availability. He will be available after Feb first week. He's
available Monday and Tuesday morning next week. (farheen,
15:19:48)
- Dataset management Sayee (farheen, 15:23:45)
- Current plan ML workbench have an dataset
plugin. Manage it by a data source integration meta data. Hadoop
connectors and a string query that will be used to train the model.
This will be used to tightly zip model training. Model validation
can also be done. (farheen,
15:24:47)
- you will be able to train a model using a data
set. To train the model you need data. This dataset will tell you
how you did the training on models. (farheen,
15:25:32)
- we have not worked with the training
team. (farheen,
15:26:00)
- have you talked through the data
pipeline? (farheen,
15:26:35)
- you would have to message, filtering,
anonnymizing data. (farheen,
15:27:25)
- initially we want to use nifi to train the
data. (farheen,
15:27:48)
- Manoop - we will use nifi pipelines unless
there is an alternate solution. Wen Ting? (farheen,
15:28:12)
- Wenting - we want federated learning. If we're
talking ete training then you need to set up the emulator
environment in terms of compute resources and kernal iibraries when
you deploy the model. Training resources how do you activate or
trigger the training process. Not much we can work on. My
understanding is to support ETE. Emulate (farheen,
15:29:38)
- i can't share with the open source community
right now. (farheen,
15:29:57)
- enabling bi-directional communication between
the model and Acumos? (farheen,
15:30:23)
- bi-directional between two instances. It could
be hyper parameters, weights, to be passed. You can design a model
to accept new apis to accept new weights. Or create a new api to
return model platform. How that can be used can be very important
uses in training. (farheen,
15:31:30)
- Manoop - Interesting thoughts. Such
bi-directional communication will help in ways that will directly
impact licensing services that will be applied to models. We need
to discuss how we can use as a platform feature. (farheen,
15:32:16)
- 1. General platform feature that can take
advantage of the APIs was our originial vision. (farheen,
15:33:17)
- Wenting for sprint 1 we will be able to handle
the outbound from supplier to supplier then in the sprint where to
host this at the platform level. (farheen,
15:34:04)
- Manoop - When you have architectural APIs for
open community then bring them to this call. (farheen,
15:34:36)
- Wenting - we have outbound that we can share
but inbound is still a work in progress. (farheen,
15:34:57)
- Sayee - Inbound are you thinking the training
will be done or you want to do it during run time? (farheen,
15:35:26)
- Wenting - it makes sense for bi-directional
federation. (farheen,
15:35:46)
- Wenting - I can't share model specific
details. (farheen,
15:36:13)
- Manoop - Let us share the outbound APIs.
(farheen,
15:36:59)
- based on that we can discuss what is the
impact. (farheen,
15:37:37)
- Wenting - we have considered the versioning
towards license manager. We have to do this in a progressive manner
because I can't share the design. (farheen,
15:38:15)
- Sayee- If you are going to have a data set in
the platform can those be consumed by training or are we developing
something outside. (farheen,
15:38:49)
- Wenting data may not be realistic to connect to
a data platform. (farheen,
15:39:09)
- Sayee - we are not hosting the data we are
hosting the meta data and provide a link to the data. (farheen,
15:39:41)
- Wenting - you have to pipe it in a data
lake. (farheen,
15:40:20)
- Sayee - agree the connection string is what
allows you to train. (farheen,
15:40:47)
- ACTION: Manoop -
Let's plan a focused discussion on data management and training
with Wenting and Sayee. (farheen,
15:41:36)
- Glowroot Hosting and Performance Metrics (farheen, 15:43:42)
- Vaibhav - it is in the demeter release details
page. (farheen,
15:44:01)
- this fits in the OAM project because this is
not providing a brand new feature. It is a tool to monitor our
platforms performance. That is where the documentation should go.
If it requires documentation for docs.acumos.org we can move it
there also. (farheen,
15:44:58)
- Manoop this is an optional component if they
choose to have this however it is not mandatory. I would move this
wiki to OAM documentation. (farheen,
15:47:04)
- Vaibhav the steps we just saw is for each
component. If you want to make it a centralized component then you
can do a centralized deployment. Should we install a Cassandra and
try it out? (farheen,
15:48:10)
- Manoop - for development purpose we can make
decisions on our own. For example AT&T can try it and provide
feedback. However it is up to testing and development teams to
decide. (farheen,
15:49:16)
- Sayee - You can inform the users and give them
some sample data. It is up to testing and development teams are up
to the admins whether or not they choose to install. (farheen,
15:50:19)
- if you claim that ML Workbench requires these
monitoring service then Acumos platform depends on the service. Is
there any scneario? (farheen,
15:51:04)
- No, specifically for java server and services
it will intervene the ms and not scale the ms. Model performance is
another case. (farheen,
15:51:34)
- Manoop - then keep this topic internal to third
party service. (farheen,
15:52:02)
- Ken - Test team doesn't want added issues for
optional functions. (farheen,
15:54:07)
- - If it is not a part of acumos platform then
we should have an option to turn it off. (farheen,
15:54:49)
- Sayee and Vaibhav's concerns about RTU impacts. (farheen, 15:55:54)
- Sayee - We have rest APIs we can call for the
model and model version correct? (farheen,
15:56:14)
- No today the LUM only wants to know the
software id tag. It is not interested in the content of the model
then no. Solution ID, revision ID. LUM will know if it's open
source and free to use. (farheen,
15:56:57)
- Michelle this is for the composite model. He's
thinking of calling it with the revision ID vs. solution ID.
(farheen,
15:57:18)
- Sayee instead of calling by revision do you
think we can do a solution level and not a revision level for a
faster response. (farheen,
15:57:46)
- Alex- we have two calls to LUM. DS will have
to call new API and LUM will return all the software tags that are
currently allowed for the composition. So you will have a
list. (farheen,
15:58:31)
- specifics of this call we are giving you a
list. (farheen,
15:58:43)
- Sayee - that is the getList API the solution id
and revision ID will come. we agree. (farheen,
15:58:58)
- Sayee - When the user is changing for
aggregation we have to check by asset and version. (farheen,
15:59:18)
- the second call is important for the Acumos
revision id. That specific version of the model is a counter. We
do it by counter. (farheen,
15:59:58)
- Alex - Main problem is incrementing the usage
count. It looks like a special case action. If we're not going to
pass the increment usage count then... (farheen,
16:02:39)
- Sayee - when you want to increment on deploying
the model may be one way (farheen,
16:03:44)
- LUM will ignore the usage count. then we will
not have to ever worry about incrementing the counts. If you decide
to include the software then it means two apis. One all the
possible software and the second with a list of revisions that are
avialable but we will not call asset usage logic. (farheen,
16:04:57)
- ACTION: Michelle
document call aggregate, LUM will ignore the usage count
restriction. (farheen,
16:05:26)
- Michelle because it's more like a query then
actual use. (farheen,
16:05:40)
- we are not considering the model in the
catalogs. It is not an asset and should not be counted towards
usage. then we can stay with the first api or have an api plus sub
revision ids. (farheen,
16:06:43)
- ACTION: Michelle -
have a focused discussion on Vailbhav's call. (farheen,
16:07:42)
Meeting ended at 16:07:47 UTC
(full logs).
Action items
- All open community - we need resources to enhance the Acumos model deployment.
- Manoop - Let's plan a focused discussion on data management and training with Wenting and Sayee.
- Michelle document call aggregate, LUM will ignore the usage count restriction.
- Michelle - have a focused discussion on Vailbhav's call.
People present (lines said)
- farheen (77)
- collabot` (3)
Generated by MeetBot 0.1.4.