Software Engineer Intern – ML Infrastructure
Waymo | Google
Worked on ML Infrastructure at Waymo, focusing on model debugging tools and training pipeline optimization. Contributed to Google's internal libraries and developed automation tools for model conversion between different training infrastructures.
- Developed a model surgery toolkit for Orbax checkpoints, automating tensor debugging (mismatches, shape errors, module renames), preventing silent restoration failures, and reducing debugging time from days to hours.
- Extended the toolkit to automate model conversion/migration between Waymo and Google DeepMind (Gemini) training infrastructure for Waymo Foundational Models.
- Profiled and benchmarked Waymo's training pipelines, identifying execution patterns and bottlenecks.
- Contributed to the Google-wide codebase by resolving issues in the internal pyvis library and integrating pyecharts to enhance visualization capabilities across Google teams.