» Home > Jobs in UAE > Job Details

Senior HPC Systems Engineer

Job Id:	976869
Job Publisher:	External Source
Job Location:	Abu Dhabi، UAE
Job Salary:	حسب المصدر، للتقدم اتبع الرابط بنهاية التفاصيل As per source, follow link at datails end to apply
Job Category:	Technology
Job Visits:	1000
Residence:	Unspecified
Job Start Date:	08-02-2021
Job End Date:	10-03-2021

Job Requirements
We are seeking a Senior HPC Systems Engineer to maintain G42 state-of-the-art computational and data science infrastructure.
As a member of our HPC Team, you will lead and participate in the deployment, management, and optimization of systems, and processes. You will work with G42 s community to identify and provide solutions and technical support that enable our cloud customers to deploy and develop their AI applications at scale.

Responsibilities and Duties:
• Provide tier-3 in-depth technical O&M;support and administration of 24*7*365 always available production environment
• Configure, install, maintain and upgrade HPC clusters (compute, storage, and network) and applications in support of research computing environments
• Lead and collaborate on projects to maintain and enhance system functionality in areas such as systems monitoring, scheduling and resource management, configuration management, backups, HPC system management utilities/tools, HPC cluster performance and resiliency
• Diagnose, isolate and resolve complex application and system technical problems (hardware, software, network)
• Develop scripts and automation to enhance operational services and service quality
• Perform system tuning based upon proactive performance analysis
• Build, install, and support scientific software (Commercial and Open Source)
• Develop and maintain technical documentation for customer use and contribute to the internal knowledge base.

Work Experience
• Solid Experience in configuring, managing, and optimizing large Linux clusters and servers
• Expert level experience with management tools (e.g. PBS, SLURM, Moab, TORQUE, etc.)
• Experience configuring, managing, and optimizing distributed and parallel file systems such as Lustre, GPFS, NFS, Ceph and protocols FC, iSCSI, NFS, CIFS, etc.
• Knowledge of networks, routers, switches, firewalls and familiarity with high-performance networks such as Infiniband
• Strong scripting/programming capabilities ( e.g. Python, Bash, Perl)
• Experience managing virtualization platforms (VMWare, KVM, oVirt)
• Extensive knowledge of RedHat or Debian based distributions and strong experience with maintaining, upgrading, and tuning the Linux kernel
• Experience with system configuration management tools such as Puppet, Ansible, Chef, Cobbler
• Experience with monitoring/alerting tools (e.g. Ganglia, Nagios, Zabbix, Grafana)
• Strong experience with compiling and building packages tools (e.g. Spack, Conda, EasyBuild)
• Strong Experience using containerized workflows based on docker, singularity, Kubernetes
• Solid Experience configuring, installing and troubleshooting MPI
• Demonstrated ability to research, quickly identify and correct problems (debug) using system utilities and diagnostics
• Demonstrated ability to perform complex performance analysis including system processes, I/O subsystems, networks and other related components.

Desired skills
• Experience with performance benchmarking using profilers and debuggers to recommend code improvements for scalability and performance
• Experience with Nvidia DGX servers and Nvidia tools
• Experience with Linux kernel development and the Linux development community
• Experience with on-prem cloud technologies such as OpenStack
• Working knowledge of one or more programming languages such as C, C++.

https://www.naukrigulf.com/senior-hpc-systems-engineer-jobs-in-abu-dhabi-uae-in-group-42-2-to-5-years-n-cd-10008188-jid-050221500124

Send Your CV
You can register your CV at bayt.com to apply for premium jobs.

Click here to register

Printable Page

Report Advertisement

Senior HPC Systems Engineer

Jobs Categories

Jobs By Country