-
Language Model (LLM) GPU cluster to ensure stable and reliable operation of training tasks; (b) handle GPU node failures, IB network anomalies, CUDA/NCCL errors and Kubernetes scheduling failures, perform
-
skills, with the ability to translate complex AI concepts into accessible solutions for diverse stakeholders; (f) possess knowledge of Graphics Processing Units (GPUs) and Neural Processing Units (NPUs
-
Language Model (LLM) training platform, developing unified capabilities for GPU resource pooling, training job scheduling, inference acceleration and the Machine Learning Operations (MLOps) platform
Searches related to gpu
Enter an email to receive alerts for gpu "The Hong Kong Polytechnic University" positions