-
, and parallel computing, with a proven ability to work within highly secure and regulated environments. This role involves close collaboration with security teams, scientists, and IT leadership to ensure
-
systems. Expertise with batch schedulers (SLURM, PBS, LSF) and parallel file systems (Lustre, GPFS/Spectrum Scale). Proven ability to lead technical projects from concept through implementation, balancing
-
batch schedulers (e.g., SLURM, PBS, LSF) and parallel file systems (Lustre, GPFS/Spectrum Scale). Experience implementing and managing automation and configuration management frameworks (Ansible, Puppet
-
for Science @ Scale: Pretraining, instruction tuning, continued pretraining, Mixture-of-Experts; distributed training/inference (FSDP, DeepSpeed, Megatron-LM, tensor/sequence parallelism); scalable evaluation
-
for Science @ Scale: Pretraining, instruction tuning, continued pretraining, Mixture-of-Experts; distributed training/inference (FSDP, DeepSpeed, Megatron-LM, tensor/sequence parallelism); scalable evaluation
-
user support. Familiarity with scientific software, Linux systems, and parallel computing frameworks. Special Requirements: Visa sponsorship is not available for this position. This position requires
-
strategic management and strict adherence to security protocols. We are looking for candidates with extensive experience in either classified HPC data center operations, architecture, parallel computing
-
manage large-scale HPC storage systems, including parallel file systems such as Lustre, GPFS/Spectrum Scale, BeeGFS and WEKA. Design, implement, and operate large-scale Ceph storage clusters for HPC and
-
software engineering practices. Experience with GPU computing (e.g., CUDA, HIP), parallel computing (e.g., MPI, Actor Model). Familiarity with containerization (e.g., Docker, Podman, Apptainer), networking
-
-year project with several subcomponents that will be developed in parallel. This role will play crucial role in collating requirements from the program managers for the project subcomponents and