#acumos-meeting: Acumos TSC Architecture Committee

Meeting started by bryan_att at 14:04:29 UTC (full logs).

Meeting summary

  1. Agenda (bryan_att, 14:04:41)
    1. New epics added in the last few days from the AT&T CMLP team (bryan_att, 14:05:27)
    2. present: Bryan, Aimee, Anand, Chris, Chuxin, Farheen, Nat, Ofer, Pantelis (bryan_att, 14:07:21)

  2. New AT&T CMLP team epics (bryan_att, 14:07:46)
    1. https://jira.acumos.org/browse/ACUMOS-1181 "DataSet Management Capabilities" (bryan_att, 14:09:38)
    2. Anwar: CMLP team has some code implementing the use case "I want to : be able to create and manage different datasets (training, testing & scoring) that contains the metadata of the datasets with optionally linked with the source of the dataset, so that: I can use it building the model and scoring the predictions using a model" (bryan_att, 14:10:47)
    3. Pantelis: this relates to functions being considered for the Training project; the code is welcome; AT&T has been present, looking forward to CMLP team joining (bryan_att, 14:11:41)
    4. Pantelis: the training project has assumed that a PoC will be developed but not released code yet; some reqs will be defered to Release B. (bryan_att, 14:12:29)
    5. Anwar: the CMLP team is not here today so we will ask them to join the next call (Community committee on Tue), give a demo etc. (bryan_att, 14:13:22)
    6. Anwar: this will relate to the data model and to the Training project components (bryan_att, 14:15:37)
    7. Pantelis: we are working on drafting the component set for Training and should have some proposals next week (bryan_att, 14:16:05)
    8. Chris: we need to understand what type of dataset(s) are intended to be supported (bryan_att, 14:16:25)
    9. https://jira.acumos.org/browse/ACUMOS-1182 "Onboarding Datasets" "As an user, I want to : be able to have a guided workflow on creating different datasets that contains metadata of the dataset along with datasource information of the dataset so that: my experience will be seamless and intuitive" (bryan_att, 14:17:15)
    10. Anwar: datasets should be reusable across models (bryan_att, 14:17:48)
    11. Pantelis: need to understand what this means, e.g. the output of training, or input to training (bryan_att, 14:18:53)
    12. Bryan: this could be both, a trained model dataset (as used in the training), or a dataset available for use in training or testing (bryan_att, 14:20:31)
    13. Chuxin: does this also relate to sample data (bryan_att, 14:20:54)
    14. Anwar: the onboarding may provide references to the data but not the actual data itself; the sample data capability e.g. as used in Dev Challenge expects the user to upload an actual data file, vs a reference (bryan_att, 14:22:20)
    15. Anwar: the result would be entries in a dataset catalog, but not local hosting of the data (bryan_att, 14:23:49)
    16. Anwar: does the team agree this should be in scope (bryan_att, 14:24:18)
    17. Pantelis: seems implicitly in scope (bryan_att, 14:24:32)
    18. Chuxin: how are these two Jira items related ? (bryan_att, 14:24:53)
    19. Pantelis: we could merge 1182 into 1181 (bryan_att, 14:25:20)
    20. Anwar: or we could leave them as distinct capabilities supporting the overall training goal, since the work needed may not overlap entirely (bryan_att, 14:26:06)
    21. https://jira.acumos.org/browse/ACUMOS-1183 "Guided workflow for onboarding and operationalizing models" As an user: I want to : have an guided workflow based for each of my use cases on onboarding and operationalizing the model so that : my user experience is intuitive and completely guided" (bryan_att, 14:27:03)
    22. Chris: OI think this means there is some wizard type system that graphically/dialog-wise guides the user through the process of onboarding/"operationalizing" (bryan_att, 14:28:03)
    23. Bryan: "operationalizing" is new (bryan_att, 14:28:20)
    24. Pantelis: behind this is some workflow engine (bryan_att, 14:28:56)
    25. present: Vasu Kallepalli (bryan_att, 14:30:53)
    26. Vasu: the entire end-to-end flow of a model is what's meant here; we have some code/UX that we can share as a demo (bryan_att, 14:31:51)
    27. Anwar: on the dataset capability; are we talking only about the metadata (bryan_att, 14:32:28)
    28. Adi Mishra (AT&T) (bryan_att, 14:33:17)
    29. Adi: CMLP team has a set of self-service portals allowing hadoop access; we can absrtact that platform for Acumos (bryan_att, 14:34:07)
    30. Anwar: mapping an abstract capability to a aspecific implementation e.g. for a dataset is what we are after (bryan_att, 14:34:40)
    31. Chuxin: CMLP team uses a tool called Rapid Minor (bryan_att, 14:35:40)
    32. https://rapidminer.com/ (aimeeu, 14:35:44)
    33. Adi: this is AGPL based (bryan_att, 14:36:10)
    34. Bryan: it's OK as long as we do not tightly integrate (bryan_att, 14:37:54)
    35. Manoop: we still need to do a legal review on the contribution of that as a platform component (bryan_att, 14:38:27)
    36. #link https://github.com/rapidminer/rapidminer-studio (aimeeu, 14:38:29)
    37. Adi: With Rapid Minor we have end-to-end workflow support for model creating, training, etc (bryan_att, 14:39:48)
    38. Pantelis: we need abstraction, so the functions implementing model development etc can be decoupled (bryan_att, 14:40:33)
    39. Anwar: is this covered by the epics we have so far ? (bryan_att, 14:40:54)
    40. Adi: no this is additional (bryan_att, 14:41:16)
    41. Anwar: we need to create an epic for this if intended for the release (bryan_att, 14:41:40)
    42. Adi: will follow up (bryan_att, 14:41:50)
    43. Adi: re 1182 and 1181; these are closely related and could be combined; they relate to onboarding metadata and then guided creation of uses for the referenced data (bryan_att, 14:44:01)
    44. Adi: will enter a new feature request for streaming datasets (bryan_att, 14:45:00)
    45. Adi: also entered https://jira.acumos.org/browse/ACUMOS-1179 "Ability to perform Exploratory Data Analysis + Visualization + Sampling" (bryan_att, 14:46:13)
    46. Anwar: we will dive a little deeper into this now so Ofer can discuss further in the Community call (bryan_att, 14:47:30)
    47. Adi: 1179 relates to understanding the data and how a business user looks at the dataset, prior to starting training etc (bryan_att, 14:48:54)
    48. Adi: e.g. analyzing application data requires understanding the application, what datasets it creates, what semantics apply, etc (bryan_att, 14:49:40)
    49. Anwar: do you have support for that in CMLP? (bryan_att, 14:49:58)
    50. Adi: we currently use Zeppelin, which can help do rudimentary data analysis (link?) (bryan_att, 14:50:30)
    51. #link https://zeppelin.apache.org/ (aimeeu, 14:50:56)
    52. Chuxin: not sure this capability fits with the scope of Acumos; it goes way beyond what Acumos is designed to support (bryan_att, 14:51:38)
    53. Anwar: we are at discussion level on items that might be in scope, where tools exist that we can reuse (bryan_att, 14:52:07)
    54. Adi: the existing CMLP tool does not data manage the data (bryan_att, 14:52:50)
    55. Ofer: this seems highly related to training of models (bryan_att, 14:53:15)
    56. Ofer: the initial training will be a demo but not part of the release (bryan_att, 14:53:31)
    57. Ofer: so we will develop a spec and a small demo; if the team can do more it will be discussed in those meetings (bryan_att, 14:54:09)
    58. Bryan: the visualization capability seems key to the end-to-end UX, and as long as we abstract it we should be able to include this to support the E2E experience (bryan_att, 14:57:20)
    59. Chuxin: concerned that this would require the model to run under the platform (bryan_att, 14:57:42)
    60. Anwar: we all agree this is a design time tool (the Acumos platform), not a runtime tool (bryan_att, 14:58:07)
    61. Anwar: is th eonboarding of the dataset more than putting them into a catalog? (bryan_att, 14:59:05)
    62. Anwar: e.g. validating them etc without bringing the data into a running model environment (bryan_att, 14:59:30)
    63. Manoop: want to clarify that the dataset will be external (bryan_att, 15:01:20)
    64. Adi: agree, the data is pulled into the visuallization tool as needed, and not locally stored (bryan_att, 15:01:39)
    65. Chuxin: at design time, the design studio does not consider the data and does not have features (bryan_att, 15:02:26)
    66. Anwar: feedback so far is this seems related to Training, so recommend that the CMLP team work with Pantelis and the training team for deeper dives; the main purpose for today was intro to the CMLP team and their proposals (bryan_att, 15:04:18)


Meeting ended at 15:05:33 UTC (full logs).

Action items

  1. (none)


People present (lines said)

  1. bryan_att (72)
  2. collabot_ (4)
  3. aimeeu (3)


Generated by MeetBot 0.1.4.