-
for Science @ Scale: Pretraining, instruction tuning, continued pretraining, Mixture-of-Experts; distributed training/inference (FSDP, DeepSpeed, Megatron-LM, tensor/sequence parallelism); scalable evaluation
-
for Science @ Scale: Pretraining, instruction tuning, continued pretraining, Mixture-of-Experts; distributed training/inference (FSDP, DeepSpeed, Megatron-LM, tensor/sequence parallelism); scalable evaluation
-
. Demonstrated experience developing and running computational tools for high-performance computing environment, including distributed parallelism for GPUs. Demonstrated experience in common scientific programming
-
of relevant experience in Linux systems administration or HPC systems engineering. Preferred Qualifications Demonstrated experience leading the design and deployment of HPC or large-scale distributed computing
-
distributed systems techniques. Proficiency in programming languages such as Python, C++, or similar, as well as experience with HPC environments and parallel computing. Demonstrated hands-on experience and
-
distributed intelligence across the computing continuum. In this role, you will have the opportunity to lead and contribute to cutting-edge research aimed at transforming scientific data management and
-
, Mixture-of-Experts; distributed training/inference (e.g. FSDP, DeepSpeed, Megatron-LM, tensor/sequence parallelism); scalable evaluation pipelines for reasoning and agents. Federated & Collaborative