Google welcomes people with disabilities.Minimum qualifications:
Bachelor's degree or equivalent practical experience.
8 years of experience in software development.
7 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
5 years of experience with design and architecture; and testing/launching software products.
Preferred qualifications:
Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
Experience in developing software that interacts with hardware (e.g., firmware, embedded systems, system software).
Experience in production monitoring, logging, and observability tools.
Familiarity with networking protocols and technologies.
Familiarity with machine learning concepts.
About the jobGoogle's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.With your technical expertise you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions.
TPU is Google’s custom-developed Application-Specific Integrated Circuits (ASICs) used to accelerate machine learning workloads, and is critical in Google’s success in the ultra engaged AI space today. In this role, you will drive the end-to-end TPU software development and innovation across the full stack from hardware software interactions to large-scale distributed systems and networking protocols, and contribute directly to technology that enables Google’s AI goal, and work with many cross-functional teams (e.g., hardware, system, data center deployment) to enable the AI applications for Google and Cloud customers.The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide. We're the driving team behind Google's groundbreaking innovations, empowering the development of our AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.Responsibilities
Drive the technical roadmap across a various hardware, data center, and cloud infrastructure portfolio while leading next-generation TPU product introductions.
Set and communicate team priorities, support the organization's goals and develop the mid-term technical goal and roadmap. Align strategy, processes, and decision-making across teams.
Develop, test, and help deploy and debug the lower level software for TPU systems including firmware, driver, user space libraries, Linux Kernel, power, thermal, and test development.
Design and implement superpod software to control and manage TPU AI hypercomputers containing thousands of TPU machines, constructing and connecting TPU slices with shape requested by users.
Build and evolve the TPU hypercomputer health ecosystem, integrating hardware and networking quality assurance, repair, and monitoring. Partner with cross-functional infrastructure, engineering, and external teams to plan and execute end-to-end programs, from product development to productivity gains. .
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.