Skip to end of metadata
Go to start of metadata



Action Items:

SummaryDescriptionResponsibleDue Date


Ops is investigating the best way to manage the enterprise services they provide.  Options being studied: connecting the current distinct machines and VM clusters at various XSEDE sites and allowing anyone with XSEDE admin privileges to manage services on any XSEDE-owned machine; or migrating all of the services to a cloud service (specifically they are looking at AWS); or migrating some services to the cloud while others remain on XSEDE hardware/sites.

Current status: They currently host 49 services on 82 hosts.  Some are web-based but most are hosted on machines at XSEDE sites.  Some services have automatic failover, but others do not. Planned or emergency outages on the latter require manual intervention by the primary staff responsible or a backup.  Admins at one site cannot manage services hosted at another.

Final decision will weigh fiscal responsibility, security, ease of administration and other factors as below.

In house option:

Connect the distinct VM cluster "islands" at XSEDE sites to allow easy migration of services during an outage (planned or emergency) and allow all with XSEDE admin privileges access to any XSEDE enterprise service, regardless of where it is located.

This may require purchasing additional hardware just for XSEDE; however can host several moderate size services on one machine.  Will require networking also.

One advantage is that if a service needs to be pulled offline, we have console access to stop/migrate/start.

Current available mechanisms could allow for very cloud-like operations.

Can streamline some processes, e.g., if a group needs a new service they would not have to request services from multiple sites, a NICS admin could spin up a VM at TACC, for example.

Cloud services option:

Migrate some or all services to a cloud provider, looking into AWS.  Testing is underway with Amy Schuele's group to migrate the XDCDB to Amazon.  

Questions/concerns on this option:

  • Stability Current uptime for SysOps is in 99.[8-9]% range, do not want this to degrade
  • What to use
    • Reserve entire node 24/7/356? 
    • Reserve part of a node?  In this case, security concerns about who else would be on the same node.
    • Amazon proxy model still in flux, being investigated
  • Ease of administration
  • Security  would this expose us to additional security issues?   Jim Marsteller reports that there is a document outlining a security baseline that cloud services would have to support.  In the cloud, we would lose visibility of network traffic and need to find a way to have that in local networks.
    JP suggests talking to Steve at Globus or IU's CTSC as they investigated security issues in using Amazon and we can benefit from their research.
  • Migration:  if we choose AWS now and want to migrate to Google in the future, what does that transition look like?
  • Cost  Weigh against some savings from reduced cost of XSEDE support staff.  Moving only some services to the cloud may lose the economy of scale savings; need XSEDE hardware for some services while also paying AWS for computation and storage
  • Burst usage  Allows for a burst into the cloud if/when needed, e.g., a large training class.



  • No labels