#acumos-meeting: Architecture Committee

Meeting started by farheen at 15:02:34 UTC (full logs).

Meeting summary

    1. https://bestpractices.coreinfrastructure.org/en/projects/3654 (farheen, 15:09:49)

  1. Sprint 1 status (farheen, 15:10:03)
    1. Vaibhav wants to add RTU regarding ML workbench and licensing. (farheen, 15:12:07)
    2. Vaibhav have discussed integration with licensing team. However when we integrate it will delay our response. (farheen, 15:12:36)
    3. this is for model aggregation (composite models) (farheen, 15:13:02)
    4. Sayee - We don't want to do recursive library version. I hope the license will be for the model and not for each version. (farheen, 15:13:30)
    5. Michelle - Have you talked to Alex recently? (farheen, 15:13:47)
    6. Sayee - we need a more effective way of accessing. We will need portal, ds, and ml workbench (farheen, 15:15:55)
    7. add a topic for Jenkins installation (farheen, 15:16:11)
    8. Sayee - federated models, how do you deploy? How do you scale the model? (farheen, 15:16:31)
    9. Manoop - Who will be leading the deployment effort? Internally we had a brief discussion about acquiring resources. (farheen, 15:17:15)
    10. ACTION: All open community - we need resources to enhance the Acumos model deployment. (farheen, 15:18:10)
    11. Nat - Santosh is not able to but Justin was able to deploy a model using Bryan's scripts. We asked Justin to have a working session and guide the team. (farheen, 15:19:04)
    12. Nat - I asked Michelle to check with Justin's availability. He will be available after Feb first week. He's available Monday and Tuesday morning next week. (farheen, 15:19:48)

  2. Dataset management Sayee (farheen, 15:23:45)
    1. Current plan ML workbench have an dataset plugin. Manage it by a data source integration meta data. Hadoop connectors and a string query that will be used to train the model. This will be used to tightly zip model training. Model validation can also be done. (farheen, 15:24:47)
    2. you will be able to train a model using a data set. To train the model you need data. This dataset will tell you how you did the training on models. (farheen, 15:25:32)
    3. we have not worked with the training team. (farheen, 15:26:00)
    4. have you talked through the data pipeline? (farheen, 15:26:35)
    5. you would have to message, filtering, anonnymizing data. (farheen, 15:27:25)
    6. initially we want to use nifi to train the data. (farheen, 15:27:48)
    7. Manoop - we will use nifi pipelines unless there is an alternate solution. Wen Ting? (farheen, 15:28:12)
    8. Wenting - we want federated learning. If we're talking ete training then you need to set up the emulator environment in terms of compute resources and kernal iibraries when you deploy the model. Training resources how do you activate or trigger the training process. Not much we can work on. My understanding is to support ETE. Emulate (farheen, 15:29:38)
    9. i can't share with the open source community right now. (farheen, 15:29:57)
    10. enabling bi-directional communication between the model and Acumos? (farheen, 15:30:23)
    11. bi-directional between two instances. It could be hyper parameters, weights, to be passed. You can design a model to accept new apis to accept new weights. Or create a new api to return model platform. How that can be used can be very important uses in training. (farheen, 15:31:30)
    12. Manoop - Interesting thoughts. Such bi-directional communication will help in ways that will directly impact licensing services that will be applied to models. We need to discuss how we can use as a platform feature. (farheen, 15:32:16)
    13. 1. General platform feature that can take advantage of the APIs was our originial vision. (farheen, 15:33:17)
    14. Wenting for sprint 1 we will be able to handle the outbound from supplier to supplier then in the sprint where to host this at the platform level. (farheen, 15:34:04)
    15. Manoop - When you have architectural APIs for open community then bring them to this call. (farheen, 15:34:36)
    16. Wenting - we have outbound that we can share but inbound is still a work in progress. (farheen, 15:34:57)
    17. Sayee - Inbound are you thinking the training will be done or you want to do it during run time? (farheen, 15:35:26)
    18. Wenting - it makes sense for bi-directional federation. (farheen, 15:35:46)
    19. Wenting - I can't share model specific details. (farheen, 15:36:13)
    20. Manoop - Let us share the outbound APIs. (farheen, 15:36:59)
    21. based on that we can discuss what is the impact. (farheen, 15:37:37)
    22. Wenting - we have considered the versioning towards license manager. We have to do this in a progressive manner because I can't share the design. (farheen, 15:38:15)
    23. Sayee- If you are going to have a data set in the platform can those be consumed by training or are we developing something outside. (farheen, 15:38:49)
    24. Wenting data may not be realistic to connect to a data platform. (farheen, 15:39:09)
    25. Sayee - we are not hosting the data we are hosting the meta data and provide a link to the data. (farheen, 15:39:41)
    26. Wenting - you have to pipe it in a data lake. (farheen, 15:40:20)
    27. Sayee - agree the connection string is what allows you to train. (farheen, 15:40:47)
    28. ACTION: Manoop - Let's plan a focused discussion on data management and training with Wenting and Sayee. (farheen, 15:41:36)

  3. Glowroot Hosting and Performance Metrics (farheen, 15:43:42)
    1. Vaibhav - it is in the demeter release details page. (farheen, 15:44:01)
    2. this fits in the OAM project because this is not providing a brand new feature. It is a tool to monitor our platforms performance. That is where the documentation should go. If it requires documentation for docs.acumos.org we can move it there also. (farheen, 15:44:58)
    3. Manoop this is an optional component if they choose to have this however it is not mandatory. I would move this wiki to OAM documentation. (farheen, 15:47:04)
    4. Vaibhav the steps we just saw is for each component. If you want to make it a centralized component then you can do a centralized deployment. Should we install a Cassandra and try it out? (farheen, 15:48:10)
    5. Manoop - for development purpose we can make decisions on our own. For example AT&T can try it and provide feedback. However it is up to testing and development teams to decide. (farheen, 15:49:16)
    6. Sayee - You can inform the users and give them some sample data. It is up to testing and development teams are up to the admins whether or not they choose to install. (farheen, 15:50:19)
    7. if you claim that ML Workbench requires these monitoring service then Acumos platform depends on the service. Is there any scneario? (farheen, 15:51:04)
    8. No, specifically for java server and services it will intervene the ms and not scale the ms. Model performance is another case. (farheen, 15:51:34)
    9. Manoop - then keep this topic internal to third party service. (farheen, 15:52:02)
    10. Ken - Test team doesn't want added issues for optional functions. (farheen, 15:54:07)
    11. - If it is not a part of acumos platform then we should have an option to turn it off. (farheen, 15:54:49)

  4. Sayee and Vaibhav's concerns about RTU impacts. (farheen, 15:55:54)
    1. Sayee - We have rest APIs we can call for the model and model version correct? (farheen, 15:56:14)
    2. No today the LUM only wants to know the software id tag. It is not interested in the content of the model then no. Solution ID, revision ID. LUM will know if it's open source and free to use. (farheen, 15:56:57)
    3. Michelle this is for the composite model. He's thinking of calling it with the revision ID vs. solution ID. (farheen, 15:57:18)
    4. Sayee instead of calling by revision do you think we can do a solution level and not a revision level for a faster response. (farheen, 15:57:46)
    5. Alex- we have two calls to LUM. DS will have to call new API and LUM will return all the software tags that are currently allowed for the composition. So you will have a list. (farheen, 15:58:31)
    6. specifics of this call we are giving you a list. (farheen, 15:58:43)
    7. Sayee - that is the getList API the solution id and revision ID will come. we agree. (farheen, 15:58:58)
    8. Sayee - When the user is changing for aggregation we have to check by asset and version. (farheen, 15:59:18)
    9. the second call is important for the Acumos revision id. That specific version of the model is a counter. We do it by counter. (farheen, 15:59:58)
    10. Alex - Main problem is incrementing the usage count. It looks like a special case action. If we're not going to pass the increment usage count then... (farheen, 16:02:39)
    11. Sayee - when you want to increment on deploying the model may be one way (farheen, 16:03:44)
    12. LUM will ignore the usage count. then we will not have to ever worry about incrementing the counts. If you decide to include the software then it means two apis. One all the possible software and the second with a list of revisions that are avialable but we will not call asset usage logic. (farheen, 16:04:57)
    13. ACTION: Michelle document call aggregate, LUM will ignore the usage count restriction. (farheen, 16:05:26)
    14. Michelle because it's more like a query then actual use. (farheen, 16:05:40)
    15. we are not considering the model in the catalogs. It is not an asset and should not be counted towards usage. then we can stay with the first api or have an api plus sub revision ids. (farheen, 16:06:43)
    16. ACTION: Michelle - have a focused discussion on Vailbhav's call. (farheen, 16:07:42)


Meeting ended at 16:07:47 UTC (full logs).

Action items

  1. All open community - we need resources to enhance the Acumos model deployment.
  2. Manoop - Let's plan a focused discussion on data management and training with Wenting and Sayee.
  3. Michelle document call aggregate, LUM will ignore the usage count restriction.
  4. Michelle - have a focused discussion on Vailbhav's call.


People present (lines said)

  1. farheen (77)
  2. collabot` (3)


Generated by MeetBot 0.1.4.