-
, and parallel computing, with a proven ability to work within highly secure and regulated environments. This role involves close collaboration with security teams, scientists, and IT leadership to ensure
-
frameworks to maintain secure and compliant environments. Document system architectures, processes, and best practices, and contribute to internal knowledge sharing. Participate in on-call rotations and off
-
for Science @ Scale: Pretraining, instruction tuning, continued pretraining, Mixture-of-Experts; distributed training/inference (FSDP, DeepSpeed, Megatron-LM, tensor/sequence parallelism); scalable evaluation
-
for Science @ Scale: Pretraining, instruction tuning, continued pretraining, Mixture-of-Experts; distributed training/inference (FSDP, DeepSpeed, Megatron-LM, tensor/sequence parallelism); scalable evaluation
-
workplace – in how we treat one another, work together, and measure success. Basic Qualifications: A BS degree in computer science, computer engineering, information technology, information systems, science
-
strategic management and strict adherence to security protocols. We are looking for candidates with extensive experience in either classified HPC data center operations, architecture, parallel computing
-
manage large-scale HPC storage systems, including parallel file systems such as Lustre, GPFS/Spectrum Scale, BeeGFS and WEKA. Design, implement, and operate large-scale Ceph storage clusters for HPC and
-
challenges facing the nation. We are seeking a Signal Processing Engineer who will develop and deploy advanced sensing and communications systems for critical Energy and National Security applications.This
-
and clustered computing services to researchers who process large data sets and/or develop code as a part of their project. Ensure the availability, performance, scalability, and security of production
-
and clustered computing services to researchers who process large data sets and/or develop code as a part of their project. Ensure the availability, performance, scalability, and security of production