UCSD Layoff from Career Appointment: Apply by 05/18/21 for consideration with preference for rehire. All layoff applicants should contact their Employment Advisor.
Special Selection Applicants: Apply by 05/27/21. Eligible Special Selection clients should contact their Disability Counselor for assistance.
UC San Diego Policy will not allow this position to receive work visa sponsorship.
The San Diego Supercomputer Center (SDSC) is a world leader in using, innovating and providing cyberinfrastructure to enable advances and new discovery in science and engineering. Focusing on data-oriented and computational science and engineering applications, SDSC serves as an international resource for data cyberinfrastructure through the provision of software, hardware and human resources in multidisciplinary science and engineering, and is a leading national cyberinfrastructure center to the National Science Foundation (NSF) and broader community.
SDSC’s High-Performance Systems Group is responsible for and operates SDSC’s high-performance computing clusters and related systems. The group operates large-scale compute and storage systems funded by the National Science Foundation (currently the XSEDE program), the UC San Diego campus (e.g., the Triton Shared Compute Cluster) and other entities; these systems support users from campus, national, and international communities across a broad range of scientific disciplines. The group is part of SDSC’s Data-Enabled Scientific Computing (DESC) Division.
The HPC Systems Integration Engineer 4 will apply advanced systems and software integration concepts, and location or institutional objectives, to resolve highly complex issues where analysis of systems and software requires an in-depth evaluation of variable factors to resolve and implement medium to large projects of broad scope and complexity. S/he will regularly resolve highly complex business processes, system functionality, implementation issues, and system and software integration issues where analysis of situations or data requires an in-depth evaluation of variable factors. The incumbent will select tools, methods, techniques, and evaluation criteria to obtain results, give technical presentations to associated team, other technical units, and management, evaluate new technologies including performing moderate to complex cost/benefit analyses and may lead a team of systems/infrastructure professionals.
This position has primary responsibility for Triton Shared Computing Cluster, a UC San Diego computing resource operated on behalf of the UC San Diego research community. TSCC comprises approximately 300 general computing and GPU nodes and is designed to grow as more researchers participate. In this capacity, the incumbent interacts with condo owners to provide access to the computational resources and software required for their research, develops policies to ensure reliable and efficient cluster operation, and works with user support staff to provide effective support while balancing competing and sometimes incompatible needs and desires from different laboratories to ensure equitable treatment of different projects.
The incumbent also provides project, system administration support, and on-call duties for other resources at SDSC including but not limited to: Expanse, an NSF-funded supercomputer operated on behalf of the national research community; Popeye, an HPC resource managed for the Simons Foundation; and Voyager an AI supercomputer that will be deployed in 2021.
Incumbent works extensively with members of the SDSC HPC systems group to coordinate operations between TSCC and SDSCs other HPC systems and storage. The position involves researching existing cluster operations, monitoring, and reporting tools as well as designing, implementing, and documenting new ones. In this role the incumbent may lead the design and implementation of new high performance cluster resources, determining the best architecture solutions using state-of-the-art computational, storage, and network technologies. Oversees multiple vendor proposals, evaluating the relative strengths and weaknesses of each to determine the best solution for continued cluster operations and expansion. TSCC and other cluster operations rely on various cluster management systems including Rocks and Bright. The position requires detailed knowledge of these or similar cluster management tools to maintain the configuration state of all managed systems. Knowledge of version control systems such as GIT is required to track system changes.
For more information, please visit www.sdsc.edu.
Bachelor's degree in Computer Science or in related area and/or equivalent experience/ training.
Advanced Knowledge of HPC and Cyber Infrastructure.
Highly advanced skills and demonstrated experience associated with one or more of the following: HPC hardware power and performance analysis; software performance analysis; research, design, modification, implementation, and deployment of HPC/data science applications and tools of large-scale scope.
Experience researching and evaluating new technology and solutions for complex environments. Proven record of integrating cutting edge hardware and software resources into complex system solutions.
Advanced knowledge of HPC middleware stack including cluster management tools, job schedulers and resources managers. Examples include: Slurm, PBS, Maui, Rocks and Bright Cluster Manager.
In-depth experience in cluster management tasks including deployment, configuration, and troubleshooting of compute nodes, management nodes, network switches, and file servers.
- Must pass a background check.