Job Description and Requirements
Job Description HPC cluster systems engineer is responsible for managing and supporting all HPC systems and Grid system, for the University data center and distributed locations. *Solves HPC and Grid related problems on a daily basis. In support of change management within the data center, provides the CSC with information about the HPC systems. * Daily verifies all HPC Systems by using the monitoring tools and proactively intervenes to solve problems. * Analyze solutions components, understand systems integration challenges and identify technology gaps. * Resolve / propose solutions to above gaps to reach future performance targets and functionality requirements. * Prototype features and perform integration checkout of various software components, and collaborate with component developers and solutions architects. * Develop / drive validation test content and evaluate systems components. * Engage with industry partners as required to identify and investigate best-known methods used in the HPC community and apply those methods. * Collaborate with architects and developers to define architectural requirements for high-end HPC clusters. * Responsible for system integration and validation of UAEU HPC clusters. * Responsible of monitoring all HPC and Grid services. * Co-ordinates work with vendors for support. * Tests and deploys HPC systems. * Knowledge of IT Service Management frameworks. * Maintains accurate and comprehensive documentation diagrams of the enterprise HPC system, backup infrastructure, communications flow, and routing. * Other duties as assigned.
Minimum Qualification * Bachelor degree required in Computer Engineering/Science * 3-6 years of experience * HPC Cluster Administration * Advanced RED Hat Linux Administration
Preferred Qualification * Knowledge of server hardware components, diagnostics and replacing them defective items. * Good communication skills & Report Writing Skills. * Must be able to work under pressure in a fast-paced work environment. * Must be able to work flexible hours including evenings, weekends, holidays and overtime as required, should be available 24/7 on-call in case of major services outage. * Strong problem solving, testing, and network troubleshooting skills * Cluster solutions integration and administration * Linux operating systems and OS components for HPC clusters * Cluster provisioning, systems management, resource management middleware * Cluster interconnect fabrics and software stack * HPC Cluster storage solutions * Parallel programming models for HPC clusters
*** تقدم على الرابط التالي : Apply on the following link ***
https://www.akhtaboot.com/en/uae/jobs/al-ain/119416-Senior-HPC-Cluster-System—Eng-at-United-Arab-Emirates-University
سجل سيرتك الذاتية الآن مجانا لتتقدم لآلاف الوظائف، وتتواصل مع الآف الشركات (+40 ألف شركة)، فقط أكمل نموذج التسجيل