Wed, 12/11 – 12-1pm ET | 11am-12pm CT | 10-11am MT | 9-10am PT
Executive Summary of Meeting
|Time (EDT)||Duration||Item w/ Notes (presentation materials linked)||Lead|
|12:00 PM EST|
Approval of October XAB Meeting Summary
- Minor change requested from Ops to change "Challenges are in staffing..." to "Most significant changes are in staffing..." Leslie sent the new version to the XAB email list.
- Ron noted that the summary did not include any specific recommendations for the project. This is fine, but board members should be aware that if they have something specific they want to recommend to the project that those should be specifically called out in the meeting summaries.
- Past recommendations are getting addressed now following receipt of clarification from the XAB on several of the recommendations.
- ACTION: Leslie/Ron: In the future the project should provide a status update on recommendations addressed to the XAB.
- Emre/Randy motion to accept the summary with the change noted.
Acknowledgment of 2020 XAB meeting schedule
- 2020 Feb Call: Tue 2/4 – 12-1pm ET | 11am-12pm CT | 10-11am MT | 9-10am PT
- 2020 April F2F Mtg: Mon 4/20-Tue 4/21 – Dinner the night before and 8am-4pm CT meeting in Chicago area
- If anyone is not able to travel to Chicago we will provide Zoom coordinates (note that this is not ideal for remote participants as it is typically difficult to hear)
- 2020 June Call: Tue 6/23 – 11am-12pm ET | 10-11am CT | 9-10am MT | 8-9am PT
- 2020 Aug Call: Fri 8/28 – 2-3pm ET | 1-2pm CT | 12-1pm MT | 11am-12pm PT
- 2020 Oct Webinar: Fri 10/23 – 9am-1pm ET | 8am-12pm CT | 7-11am MT | 6-10am PT
- 2020 Dec Call: Tue 12/15 – 2-3pm ET | 1-2pm CT | 12-1pm MT | 11am-12pm PT
|12:10 PM EST||45 mins|
XSEDE Response to NSF Blueprint documents and NSF 20-015 RFI
- NSF DCL/RFI 20-015 (Data-Focused Cyberinfrastructure) DUE Monday 12/16
- OAC Blueprint: Coordination Services
- OAC Blueprint: Overview and Computational Ecosystem
- OAC Blueprint: International Research & Education Network Connections
- John: All are out for comment by the community. Encourage feedback but provide no instructions for how to provide feedback. XSEDE plans to provide a response to each of these. Blueprint doc for coordination services is most important to XSEDE. Believe this will be guiding doc for whatever solicitation NSF puts out for follow-on to XSEDE. XSEDE's lessons learned & collective experiences are key to this. John pointed NSF to the final TeraGrid and final XSEDE1 reports.
- Staff is actively trying to compose a response to the RFI doc.
- Discussion about RFI: 2 primary questions and 1 open-ended question.
- Q1: Data-Intensive Research Question(s) and Challenge(s). Describe current or emerging data-intensive/data-driven S&E research challenge(s), providing context in terms of recent research activities and standing questions in the field. NSF is particularly interested in cross-disciplinary challenges that will drive requirements for cross-disciplinary and disciplinary-agnostic data-related CI.
- Scientific challenges that are emerging that are dependent on data-intensive technologies. Many fields of science are finding common services needed. NSF hasn't made as much of an investment here.
- Ken: Large haldron glider. As we look at luminosity in the future, data volumes will be huge, need to store, make available to wide community of users, how this data interacts with processors. Long time scale but need to be working on this now.
- Solutions: R&D funding mechanism: CSSI. Some of biggest challenges are processing. Collision events are busy, non-linear dependence of time to reconstruct events. Can we take advantage of novel CPU architectures? Optimized for X86. Possibly use GPUs? Don't want to keep rewriting algorithms. AI solutions for pattern recognition of events.
- David: Shared infrastructure workshop: Geo & Bio primarily. Survey of needs & white paper. Volumes aren't as large as LHC, but projects are still around 15PB needed for storage. Licensing aspects. Looking to take advantage of new methods, cloud pilots. Related discussion at AGU. Was to be sent to NSF program officers a few weeks ago. David to see if he can share this doc.
- Randy: NIH has funded 3 cloud computing data centers for cancer research. Huge genomic database (Atlas) sitting there. Paid attention to reproducibility issues. Will depend on domain.
- Model that could be extended to other domains? Yes but would need to come from those domain scientists. Challenging for NSF to do this on a large scale. 3 centers hosted on Amazon. Not that much computation, but access to right resources for this community.
- John: Satellite observations need to be shared with large community. NASA moving earth science data to earth data cloud hosted in AWS. Sense there are multiple domains that could benefit from something like this. Investigation into this could be worthwhile. We should facilitate sharing across domain boundaries when we can.
- Set of software, ability to reproduce results
- Tom: Data sets in biological spaces: Across all fields. Growth everywhere from economics to genomics etc. Data movement, cost issue. Too big for NSF to tackle. Response has to be focused on prototype infrastructure that can help larger community like what was proposed for storage across TeraGrid. CASC response: preservation/metadata: boiling ocean & impossible. Worried about mandates from govt. Need for reproducibility. Response not to fix the whole problem, but what NSF can do to help.
- JT: Can we identify pieces of CI that would be common across multiple domains that NSF could invest in. Concerned that Open Storage Network is NSF's only
- Can the response be if we redo DIBBs, how would we redirect that $. Good to have archive that can handle data currently in XSEDE resources.
- Want to deal with metadata, but problem is where to put data.
- Suggested that data sharing be done via cloud. Advantages to this, but expensive. NASA has been experimenting with data cloud and has run into challenges including cost. Also concerned that cloud is inadequate to support large-scale, complex science/analyses being done. If you have to pull data down from cloud to do analysis, that becomes very expensive.
- Had training with AWS yesterday & very hard to use. Expensive to access data.
- Emre: small angle scattering: AWS cost was prohibitive. 1TB/mo. Having somewhere we can store this data economically in the cloud would be helpful. If NSF could provide intermediate solutions would be beneficial to the project.
- Tom: NSF embracing cloud but hasn't done financial analysis.
- JT: Earth sciences directorate pushing hard to move data to cloud. working with researcher at IL–for typical analysis of data NASA has made available what is cost. Difficult to use environments, but if they used it correctly they would not pay for data movement but only computation. Researcher test resulted in $1300 bill. Contracts with AWS are complicated & difficult to figure out how to avoid incurring such costs. Concerned that agencies make huge commitments in this direction will be more complicated & costly to do science.
- Firmalab has done some work bursting into cloud.
- Machine learning infrastructures in Amazon, Azure will dwarf anything done by any academic or national lab. Question is when is transition point, and can we convince cloud providers to do something good. AI/deep learning tools are more advanced & will be hard to do in the cloud because not well supported. People can help you bridge to technology. If they had the people they would have done this already. Humans involved in making this work well are more important when cloud is involved.
- Cloud Bank (SDSC) will be interesting project to see how it develops to support academic community use of cloud at scale. A lot of work that could move to cloud, but lack of support makes it difficult. Pilot for something larger downstream
- David: Princeton–use of exclusively cloud resources for NIH project. Time to analyze & put machine learning onto slices of brain images. Didn't share slides. Worked heavily with Google to help with personnel resources. Data blows up by a factor of 10. Can recompress. To make sustainable for future: keep all data on prem outside of cloud & transfer in where there are not fees to do computation and then remove. 2PB data set. Presented at NSF workshop on next gen cloud infrastructure. David to follow up and see if he can find more details.
Blueprint Coordination Services Doc is next in line:
- NSF provided no deadline & no instructions for how to provide input. Earlier is better and will send response to Manish.
- Tom: Very generic & not aware of current state of community. Community engagement not in there. Workforce dev is there, but not about professional dev. World is getting broader, new areas: need people to help researchers use these things. This wasn't addressed
- John: concerned there are elements XSEDE provides today that are lacking recognition in this doc. Lack of addressing outreach efforts, how we bring new communities in...
Hope to submit responses to blueprint docs in January. Happy to share our responses with the board. Responses are written collectively across the project. Try to draw on project staff as much as possible.
If XAB members are interested in commenting on the XSEDE response docs let John know.
|1:00 PM EDT|| |
Next meeting: Tuesday, February 4; 12pm ET (11CT/10MT/9PT)