In this role, you will:
- Design, develop, and optimize data pipelines using Azure Databricks, PySpark, and Prophesy.
- Implement and maintain ETL/ELT pipelines using Azure Data Factory (ADF) and Apache Airflow for orchestration.
- Develop and optimize complex SQL queries and Python-based data transformation logic.
- Work with version control systems (GitHub, Azure DevOps) to manage code and deployment processes.
- Automate deployment of data pipelines using CI/CD practices in Azure DevOps.
- Ensure data quality, security, and compliance with best practices.
- Monitor and troubleshoot performance issues in data pipelines.
- Collaborate with cross-functional teams to define data requirements and strategies.
Requirements
To be successful in this role, you should meet the following requirements:
- 5+ years of experience in data engineering, working with Azure Databricks, PySpark, and SQL.
- Hands-on experience with Prophesy for data pipeline development.
- Proficiency in Python for data processing and transformation.
- Experience with Apache Airflow for workflow orchestration.
- Strong expertise in Azure Data Factory (ADF) for building and managing ETL processes.
- Familiarity with GitHub and Azure DevOps for version control and CI/CD automation.
- Solid understanding of data modelling, warehousing, and performance optimization.
- Ability to work in an agile environment and manage multiple priorities effectively.
- Excellent problem-solving skills and attention to detail.
- Experience with Delta Lake and Lakehouse architecture.
- Hands-on experience with Terraform or Infrastructure as Code (IaC).
- Understanding of machine learning workflows in a data engineering context.