2024年8月 - 現在
- Built scalable, cloud-native data pipelines in Python and SQL, and responsible for managing up to 100+ pipelines.
-Processed structured and semi-structured data, storing them in Snowflake, MongoDB, and Azure SQL, supporting data workflows across 5+ business units and enabling a 30% improvement in reporting efficiency.
- Design and build data pipelines for unstructured data such as images and perform preprocessing, storage, and integration into downstream systems.
- Designed ingestion pipelines integrating various APIs and queue systems to support high-volume asynchronous data ingestion with minimal latency.
- Deployed ETL workflows on AWS (Lambda, S3, EC2) and Azure (ADF, Functions) using CI/CD pipelines, improving deployment speed and reducing downtime.
- Developed geospatial pipelines by integrating GIS shapefiles and geocoding APIs, improving address accuracy across >1,000,000 property records.
- Automated data scraping workflows using Playwright and Python for public data collection, supporting near real-time updates and cloud integration.
- Collaborated in agile teams, continuously improving data validation and quality control processes using source control and automated testing.