Job Summary
Description
Client based in Sandton is hiring! We are in search of a Senior Machine Learning Operations Engineer (ML Ops Engineer) to join the Private Bank Technical Business Intelligence Team. The successful candidate will be responsible for deploying, maintaining, and monitoring machine learning models. We are looking for someone with a background in cloud infrastructure, Kubernetes, deployment pipelines, and a deep understanding of machine learning.
Responsibilities and skills include:
- Deliver strategic goals and business objectives
- Maintaining platform stability
- Design and build solutions focused on efficiency
- Strong team dynamics, people skills and relationship/network building
- Ensuring the strategy and teamwork within the principles and practices of MLOps and engineering as defined by group engineering and best practices
- Solid grasp of DevOps/SRE methodologies and practices
- Provide technical guidance and support throughout the release process, including strong troubleshooting abilities across the platform and channel
- Strong design and solutioning experience, across multiple technologies and understanding of Cloud DevOps services and hosting
- Git and CI/CD understanding
- Cloud native, hybrid cloud, and on-prem design principle understanding
- Developing and maintaining deployment pipelines for machine learning models on Microsoft Azure
- Monitoring and optimizing the performance of machine learning models in production
- Collaborating with data scientists for seamless deployment of models
- Ensuring high availability and reliability of the machine learning infrastructure on Microsoft Azure
- Providing technical support for machine learning models in production
- Conducting regular security assessments and ensuring compliance with industry standards and best practices
- Keeping up-to-date with new Azure ML offerings and technologies to continuously improve our ML ops processes
Requirements:
- Minimum BSc Computer Science, Engineering, or related field
- At least 5 years of experience in ML Operations or a similar role
- Extensive experience with Microsoft Azure, Azure pipelines, Functions, and ML offerings
- Knowledge of containerization technologies (Docker, Kubernetes, Rancher)
- Strong programming skills in Python, FastAPI, Redis, and SQL
- Strong understanding of Software Engineering concepts
- Strong experience writing unit tests
- Knowledge of machine learning frameworks such as TensorFlow, PyTorch, etc.
- Experience with monitoring and logging tools (e.g. Grafana, Kibana, etc.)
- Excellent problem-solving skills and attention to detail
- Knowledge on design patterns