Sort by
Refine Your Search
-
Listed
-
Category
-
Field
-
Lustre parallel file system. NCCS serves multiple agencies including DOE, NOAA, and the Air Force. The NCCS also supports the center’s Quantum Computing User Program (QCUP) which provides access to state
-
and clustered computing services to researchers who process large data sets and/or develop code as a part of their project. Ensure the availability, performance, scalability, and security of production
-
within a multi-disciplinary research environment consisting of computational scientists, applied mathematicians, and computer scientists to link models and algorithms with high-performance computing
-
, and parallel computing, with a proven ability to work within highly secure and regulated environments. This role involves close collaboration with security teams, scientists, and IT leadership to ensure
-
, Integrity, Teamwork, Safety, and Service. Promote equal opportunity by fostering a respectful workplace – in how we treat one another, work together, and measure success. Basic Qualifications: A PhD in
-
frameworks to maintain secure and compliant environments. Document system architectures, processes, and best practices, and contribute to internal knowledge sharing. Participate in on-call rotations and off
-
strategic management and strict adherence to security protocols. We are looking for candidates with extensive experience in either classified HPC data center operations, architecture, parallel computing
-
, finite volume, and machine learning to solve challenging real-world problems related to structural materials and advanced manufacturing processes. The successful candidate will have experience with
-
for Science @ Scale: Pretraining, instruction tuning, continued pretraining, Mixture-of-Experts; distributed training/inference (FSDP, DeepSpeed, Megatron-LM, tensor/sequence parallelism); scalable evaluation
-
for Science @ Scale: Pretraining, instruction tuning, continued pretraining, Mixture-of-Experts; distributed training/inference (FSDP, DeepSpeed, Megatron-LM, tensor/sequence parallelism); scalable evaluation