【Company Highlights】
🌟 Specializing in AI-driven customer service solutions and virtual assistants, and using natural language processing and machine learning for automated interactions
🌟 Aims to enhance customer experience and streamline business operations through AI technology
🌟 Fully remote with competitive package and benefits
【Responsibilities】
- Design and maintain the On-Premise GPU Cloud’s infrastructure, including server, network, storage systems, and software stack (e.g. hypervisors, orchestration)
- Manage individual and team performance to ensure effective and efficient operations
- Act as the call leader and manage outages based on severity level
- Ensure high availability and performance
- Provide technical support to customers through a ticketing system
- Coordinate team to perform regular maintenance on the existing infrastructure, such as OS patching, platform upgrades, and storage management
- Implement best-in-class DevOps practices, including continuous integration, continuous deployment, and infrastructure as code
- Ensure effective management of data center operations, including ticket queue, 24/7 shift arrangements, and hardware logistics
- Inspire and guide the team to identify and implement process improvements, technology innovations, and automation initiatives
- Ensure all operational KPIs and metrics are measured and met
- Demonstrate passion for the quality and quantity of services provided, and continuously strive to improve customer experience