Netherlands
1 day ago
OCI GPU Specialist

Oracle is seeking a OCI GPU Black Belt to drive customer success in designing, deploying, and optimizing large-scale AI and HPC workloads on Oracle Cloud Infrastructure (OCI). This role combines deep technical expertise in NVIDIA GPUs, distributed training and inference frameworks, benchmarking and performance tuning, RLHF pipelines, and end-to-end solution delivery.

The OCI GPU Black Belt will work in close collaboration with our sales, marketing, and technical teams to drive revenue growth and accelerate market penetration for our NVIDIA GPU compute services.

The role will also play a critical role in progressing business opportunities, delivering technical workshops and demonstrations and supporting during proof-of-concepts to drive cloud consumption and overall revenue growth.

What You’ll Do

• You will engage directly with the customer to understand their requirements for NVIDIA GPUs on Oracle Cloud to run their AI infrastructure, graphics and HPC workloads. With your understanding of the customers' requirements, you will build a comprehensive and effective solution design• You will lead the solution design within a highly collaborative virtual team that includes engineering/product management, capacity planning and external partners. With the team, you will map the requirements onto Oracle cloud services, define a solution design, and if requested by the customer, offer hands-on support during the proof-of-concept phase to deploy and test the proposed solution. After the PoC, you advise and guide the customer/partner on running and maintaining their NVDIA GPU workload on Oracle Cloud according to our best practices.• Deliver technical workshops, proofs-of-concept (PoCs), and demos, collaborating closely with sales, engineering, and customer teams to validate end-to-end solutions and accelerate cloud adoption.• Optimize end-to-end AI workloads by analyzing hardware bottlenecks (GPU utilization, memory bandwidth, network latency), applying NVLink/InfiniBand interconnects, RDMA storage solutions, and tuning parallel libraries (MPI, CUDA) for peak efficiency.• Deploy and scale HPC clusters for engineering, scientific, and financial simulations-configuring compute nodes, high-speed networking, and shared file systems to meet performance SLAs.• Lead the architecture and deployment of scalable inference platforms, leveraging containerized microservices on Kubernetes and OCI GPU instances to meet low-latency, high-throughput requirements.• Design and implement distributed training pipelines using frameworks such as DeepSpeed, and Fully Sharded Data Parallel (FSDP) to accelerate model convergence at scale.• Develop benchmarking and profiling solutions to measure training and inference performance, using mixed-precision, model- and data-parallel strategies, and generate actionable insights through dashboards and automated reports.• Guide customers in model selection and evaluation, comparing architectures (e.g., Transformers, CNNs) against workload requirements and resource constraints to optimize cost and performance.• Contribute to Oracle’s internal expert community, documenting best practices, co-authoring solution blueprints, and mentoring peers on AI infrastructure design.• Stay current with emerging AI infrastructure technologies, present at industry events, and represent Oracle as a technology evangelist at conferences and in customer forums.

Required Skills & Experience

• Excellent communication and presentation skills with high degree of comfort speaking across all levels of management (e.g. IT management, Architects, administrators and executives)• 5+ years of hands-on experience in AI/ML infrastructure or HPC, architecting and operating large-scale GPU-accelerated environments for training and inference.• Deep proficiency with NVIDIA GPU technologies (CUDA, cuDNN), RDMA networking (InfiniBand, NVLink), and cluster orchestration tools.• Expertise in distributed training and inference frameworks: PyTorch, TensorFlow, DeepSpeed, FSDP, and model parallel toolkits.• Strong background in performance optimization techniques: mixed-precision training, gradient compression, asynchronous updates, and communication overlap to maximize throughput.• Familiarity with cloud-native practices: Docker, Kubernetes, Terraform, monitoring stacks (Prometheus, Grafana), and CI/CD for infrastructure.• Solid understanding of cloud architecture principles-networking, security, resilience, and cost optimization on OCI or comparable public clouds.• Excellent communication and presentation skills, with proven ability to engage technical and executive audiences, lead virtual teams, and influence senior stakeholders.• Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related technical field.

This role offers the opportunity to shape Oracle’s AI/ML portfolio, drive revenue growth through technical leadership, and collaborate with customers to unlock the full potential of GPU-accelerated AI and HPC solutions.

 

At Oracle, we don’t just respect differences—we celebrate them. We believe that innovation starts with inclusion and to create the future we need people with diverse backgrounds, perspectives, and abilities. That’s why we’re committed to creating a workplace where all kinds of people can do their best work. It’s when everyone’s voice is heard and valued that we’re inspired to go beyond what’s been done before.

We expressly encourage disabled candidates to apply for this position. Please therefore feel free to voluntarily inform us in your application about any severe disability (degree of disability of at least 50%) or any equal status (degree of disability of at least 30% together with official decision on equality) in accordance with the German SGB IX.”

 

Por favor confirme su dirección de correo electrónico: Send Email
Todos los trabajos de Oracle