Storage & Server Test Engineer (Austin)

Job updated about 1 month ago
The employer was active 1 day ago

Job Description

The Senior Lead Storage and Server Test Engineer will play a pivotal role in the design, development, and execution of comprehensive test strategies for our AI data center's storage and server infrastructure. This leadership position requires deep expertise in enterprise storage systems, server architectures, networking, and a strong understanding of the unique performance and reliability demands of AI/ML workloads. The ideal candidate will be a hands-on technical leader, capable of mentoring junior engineers, driving test automation, and collaborating across engineering teams to deliver robust and high-performing solutions.

Required Qualifications
• Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field.
• 3+ years of experience in hardware and/or software testing, with at least 5 years focused on enterprise-level storage and server systems.
• Proven experience in a lead or senior technical role, mentoring and guiding other engineers.
• Deep expertise in various storage technologies including NVMe, SAS/SATA SSDs/HDDs, RAID, distributed file systems (e.g., Ceph, Lustre, GPFS), SAN, and NAS.
• Strong understanding of server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, and power management.
• Strong understanding of server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, power management, and Baseband Management Controllers (BMC) functionality.
• Proficiency in scripting languages (e.g., Python, Bash) for test automation and data analysis.
• Experience with Linux operating systems (e.g., Ubuntu, CentOS, RHEL) and command-line tools.
• Familiarity with networking concepts (Ethernet, TCP/IP, InfiniBand) and network testing methodologies.
• Experience with test methodologies such as performance testing, reliability testing, stress testing, and fault injection.
• Excellent problem-solving, analytical, and debugging skills.
• Strong communication and interpersonal skills, with the ability to collaborate effectively across diverse teams.

Preferred Qualifications
• Familiarity with OCP (Open Compute Project)
• Experience with cloud environments (AWS, Azure, GCP) and virtualization technologies.
• Knowledge of containerization technologies (Docker, Kubernetes).
• Familiarity with AI/ML frameworks (e.g., TensorFlow, PyTorch) and their infrastructure requirements.
• Experience with performance profiling tools (e.g., fio, Iometer, Perf, VTune).
• Contributions to open-source projects related to storage, servers, or testing.
• Certifications in relevant technologies (e.g., NetApp, Dell EMC, HPE, NVIDIA).

Requirements

• Define, develop, and implement comprehensive test plans and strategies for all storage and server hardware, firmware, and software components within the AI data center environment.
• Lead the test team in designing, executing, and analyzing complex test cases, including functional, performance, reliability, stress, and endurance testing.
• Mentor and provide technical guidance to junior test engineers, fostering a culture of technical excellence and continuous improvement.
• Design and implement automated test frameworks and scripts using languages like Python, Go, or similar, to improve efficiency and coverage of testing.
• Conduct in-depth performance analysis and bottleneck identification for storage systems (e.g., NVMe, SSD, HDD arrays, distributed storage, SAN/NAS) and server platforms (e.g., CPU, GPU, memory, PCIe, networking), and OpenBMC interfaces/features
• This includes debugging issues related to BMC functionality and its interaction with server hardware.
• Develop and maintain robust testbeds and infrastructure for continuous integration and validation.
• Utilize open-source and commercial test tools relevant to storage, server, and OpenBMC validation.
• Collaborate closely with hardware design, software development, infrastructure, and AI/ML engineering teams to understand requirements and integrate testing throughout the product lifecycle.
• Communicate test progress, results, and critical issues effectively to stakeholders, including executive leadership.
• Develop specialized test methodologies to validate performance and reliability under heavy AI/ML workloads (e.g., large model training, inference at scale, data ingestion).
• Understand and test the interactions between GPU-accelerated computing, high-speed networking, and storage systems.

1
3 years of experience required
Managing staff numbers: not specified
Personal Invitation Link
This is your personal referral link for job invitation. You'll receive an email notification when someone applied for the position via your job link.
Share this job
Logo of Celestica.

About us

What we do

At Celestica, we enable the world's best brands. We build trusted relationships and solve complex technology challenges to help our customers realize greater value, potential and outcomes. We are a leader in high-reliability design, manufacturing and supply chain solutions that brings global expertise at every stage of product development – from the drawing board to full-scale production and after-market services. With talented teams across North America, Europe and Asia, we imagine, develop and deliver a better future with our customers.

Living Our Values

At Celestica, we foster a motivated, high-integrity work environment based on a strong set of corporate Values. These Values empower our employees to provide you with superior service.

Relentless Curiosity
We are obsessed with uncovering the information and insights that allow us to anticipate and overcome the challenges of the future.

Bold Conviction
We dare to envision new solutions, new technologies, new ways of working and invest to make it a reality.

Unwavering Dedication
We exemplify teamwork and commitment in every decision and every action to be the best partners to our customers and our colleagues.

Whether you’re a recent graduate or an experienced professional, joining us means working with some of the brightest minds and most talented people in the industry. You can leave your personal stamp on projects and impact others like never before.

Join an Award-winning Team

At Celestica, we recognize that our employees play an important role in our company’s success and we strive to create a collaborative environment that fosters innovation, empowers people and leverages individual expertise. By joining Celestica, you'll discover that working for a global company creates endless career opportunities for you.