Accountabilities
- Lead the design, development, and maintenance of reliable, scalable data pipelines and ETL processes using tools such as SnapLogic, Snowflake, DBT, Fivetran, Informatica, and Python.
- Work closely with data scientists to understand model requirements and prepare the right data pipelines for training and deploying machine learning models.
- Collaborate with data scientists, analysts, and business teams to understand and optimize data requirements and workflows.
- Apply Power BI, Spotfire, Domo, Qlik Sense to create actionable data visualizations and reports that drive business decisions.
- Implement standard methodologies for version control and automation using Git Actions, Liquibase, Flyway, and CI/CD tools.
- Optimize data storage, processing, and integration bringing to bear AWS Data Engineering tools (e.g., AWS Glue, Amazon Redshift, Amazon S3, Amazon Kinesis, AWS Lambda, Amazon EMR).
- Troubleshoot, debug, and resolve issues related to existing data pipelines and architectures.
- Ensure data security, privacy, and compliance with industry regulations and organizational policies.
- Provide mentorship to junior engineers, offering guidance on best practices and supporting technical growth within the team.
Essential Skills/Experience
- SnapLogic:
- Expertise in SnapLogic for building, managing, and optimizing both batch and real-time data pipelines.
- Proficiency in using SnapLogic Designer for designing, testing, and deploying data workflows.
- In-depth experience with SnapLogic Snaps (e.g., REST, SOAP, SQL, AWS S3) and Ultra Pipelines for real-time data streaming and API management.
- AWS:
- Strong experience with AWS Data Engineering tools, including AWS Glue, Amazon Redshift, Amazon S3, AWS Lambda, Amazon Kinesis, AWS DMS, and Amazon EMR.
- Expertise in cloud data architectures, data migration strategies, and real-time data processing on AWS platforms.
- Snowflake:
- Extensive experience in Snowflake cloud data warehousing, including data modeling, query optimization, and managing ETL pipelines using DBT and Snowflake-native tools.
- Fivetran:
- Proficient in Fivetran for automating data integration from various sources to cloud-based data warehouses, optimizing connectors for data replication and transformation.
- Real-Time Messaging and Stream Processing:
- Experience with real-time data processing frameworks (e.g., Apache Kafka, Amazon Kinesis, RabbitMQ, Apache Pulsar).
Desirable Skills/Experience
- Exposure to other cloud platforms such as Azure or Google Cloud Platform (GCP).
- Familiarity with data governance, data warehousing, and data lake architectures.