Avatar of Shyam Ahuja.

Shyam Ahuja

Senior Data Engineer
Azure Architect with 14 years of experience in IT services and consulting, specializing in the design and implementation of data-intensive applications leveraging Azure Cloud and Big Data technologies. Proficient in architecting scalable solutions using Azure services such as Azure Data Lake, Azure Synapse Analytics, and Azure Databricks. Skilled in Spark analytics, AWS, Scala, Kafka, Hive, SQL, and Python, with a strong focus on data orchestration and integration. Proven track record in leading agile scrum teams, managing project lifecycles, and driving successful sprint grooming, planning, and coordination to deliver high-quality solutions that meet business objectives.
Logo of Deutsche Post DHL.
Deutsche Post DHL
Jaypee Institute of Information Technology
Berlin, Bundesrepublik

Skills

Programming - Scala
SQL
Python
Java
Data - Spark
Spark Streaming
Kafka
Databricks
Hive
ETL
MapReduce
Sqoop
Cloud - Azure
AWS
Cloudera Enterprise
Devops - Azure devops pipelines
Git
Docker
Kubernetes
Jenkins
Tools - IntelliJ
Eclipse
IBM Cognos
Microsoft SQL Server
Oracle
Visual studio
Orchestration - Airflow
Oozie
Development Methodologies - AGILE
Azure Cloud Services
Azure
Azure DevOps
DataBricks
Databricks SQL
DataBricks • Data Engineering and ETL Tools: SSIS
pyspark
fabric
Microsoft Fabric

Languages

English
Professional

Work experiences

Logo of Deutsche Post DHL.

Senior Data Engineer

Deutsche Post DHL
Full-time

Jul 2022 ~ Present
Berlin, Federal Republic of Germany
Engaged in a pivotal role within the E-commerce US Data Lake project, dedicated to crafting a centralized data lake platform on Azure. The primary goal is to seamlessly ingest data from diverse sources, encompassing both batch and real-time data, and empower business stakeholders across departments to construct data warehouses. The project, labeled ECS US Data Lake, involves the following responsibilities: - Facilitating effective communication with business stakeholders across various departments to comprehend requirements and orchestrate design discussions with the team. - Orchestrating the setup of Infrastructure Platform components and implementing security measures using automation tools, specifically Terraform. - Devising versatile Azure Data Factory pipeline templates to streamline the ingestion and consolidation of data from numerous source systems, spanning both batch and real-time environments. - Crafting and implementing consolidation strategies, along with business models and reports in Databricks using PySpark. - Design discussions with the platform and DevOps engineers to develop CI/CD processes and pipelines Working on Microsoft Fabric Evaluation and Migration Impact Analysis POC for Fabric Data Factory, Lakehouse and One Lake/Direct Lake
Logo of Accenture Latvia.

Data Engineering Specialist

Accenture Latvia
Full-time

Jul 2021 ~ Jul 2022
1 yr 1 mo
Riga, Latvia
Engaged with a mining industry client, spearheading efforts to efficiently handle and refine data from diverse source systems through Azure Cloud technologies, including Data Factories, pipelines, and Azure Databricks. Project/Product: Centralized Data Provisioning Key Responsibilities: - Lead, mentor, and provide technical guidance to a team of 5 data engineers. - Innovatively designed a streaming ingestion and curation pattern utilizing Azure EventHub services. - Developed an ingestion and curation data pipeline for building a BI analytics-focused data lake. - Engineered an event-based system for consuming and curating data from event grid topics using Azure cloud services.
Logo of S&P Global Market Intelligence.

Senior Data Engineer

S&P Global Market Intelligence
Full-time

Jan 2018 ~ Jul 2021
3 yrs 7 mos
Gurgaon district, Haryana, India
Contributed to the modernization of a platform product by leveraging Spark, Scala, and an AWS EC2 cluster for aggregations and transformations. This project focused on standardizing terabytes of incremental Market Ownership dataset for millions of companies. Project/Product - Ownership Incremental Standardization Key Achievements: - Implemented Spark-based parallel ingestion and processing framework, significantly improving performance compared to legacy systems. - Led the end-to-end delivery of a complex system involving multiple frameworks for ingestion, initialization, standardization, and aggregation. - Successfully optimized performance for complex problems, such as hierarchical queries and aggregates, using techniques like caching, repartitioning, and broadcasting. Project/Product - Modernization of Loaders using Kafka Key Contributions: - Executed the implementation of a real-time Kafka pipeline, handling millions of transactional CDC messages and processing them with business logic before persisting in the target database. - Led the end-to-end pipeline delivery, designing and developing the Kafka consumer with adherence to best practices and guidelines. - Achieved performance optimization and addressed latency issues in message consumption through Kafka optimization techniques and Docker in production.
Logo of Deloitte Consulting.

Big Data Consultant

Deloitte Consulting
Full-time

Dec 2013 ~ Jan 2018
4 yrs 2 mos
Gurgaon district, Haryana, India
Project/Product - CHIEE (Clinical Health Information Engagement Enablement) - Anthem (Nov 2016 - Jan 2018) - Led the development of a data lake storing clinical healthcare data from wearable devices through a multi-layer framework.Responsibilities included designing a real-time data ingestion and processing framework using Kafka and Spark streaming,building data models for MongoDB, and integrating layers for an end-to-end data lake solution. Project/Product - D-rive Telematics (Usage-Based Insurance) - (Jan 2015 - Oct 2016) - Contributed to a Data Analytics Telematics solution capturing user data through sensors on a mobile app. Responsibilitiesinvolved designing and implementing rules and algorithms using Java MapReduce and Hive scripts, as well as developingOozie workflows. Project/Product - Anthem Healthcare Modernization (Jun 2014 - Dec 2014) - Focused on the ingestion and transformation of healthcare data from Teradata to Hadoop (Hive) in parquet format using SparkScala. Responsibilities included designing and building an ingestion and processing framework.
Logo of Infosys Limited.

Systems Engineer

Infosys Limited
Full-time

Jul 2011 ~ Dec 2013
2 yrs 6 mos
Hyderabad District, Telangana, India
Project/Product: Emerson Process Management (EPM) - Develop framework models and reports containing crucial measures for analyzing business status. These reports will empower management in making strategic decisions for organizational growth. Roles/Responsibilities: - Design and develop DataMarts using a framework manager to meet user requirements. - Implement security at different levels, including data, object, and package. - Design intricate reports and dashboards based on end-user specifications, incorporating features such as drill-through and master-detail functionality.
Logo of Infosys Limited.

Systems Engineer Trainee

Infosys Limited
Full-time

Feb 2011 ~ Jun 2011
5 mos
Mysore, Karnataka, India
Completed Microsoft DotNet Framework 3.5 training with a CGPA of 4.78 (5). Created a Windows Phone 7 application, 'EnRAllocation Management System,' using Silverlight and C#.

Educations

Jaypee Institute of Information Technology

Bachelor’s Degree
Bachelor of Technology in Computer Science & Engineering

2007 - 2011
7.2/10 GPA

Licenses & Certifications

Logo of Microsoft.

Microsoft Certified: Azure Data Engineer Associate

Microsoft

Issued Dec 2022
Expires Dec 2023
Logo of Amazon Web Services (AWS).

AWS Certified Cloud Practitioner

Amazon Web Services (AWS)

Issued Sep 2021
Expires Sep 2024