Key responsibilities:
- Understanding business use cases and be able to convert to technical design
- Part of a cross-disciplinary team, working closely with other data engineers, software engineers, data scientists, data managers and business partners.
- You will be designing scalable, testable and maintainable data pipelines
- Identify areas for data governance improvements and help to resolve data quality problems through the appropriate choice of error detection and correction, process control and improvement, or process design changes
- Developing metrics to measure effectiveness and drive adoption of Data Governance policies and standards that will be applied to mitigate identified risks across the data lifecycle (e.g., capture / production, aggregation / processing, reporting / consumption).
- You will continuously monitor, troubleshoot, and improve data pipelines and workflows to ensure optimal performance and cost-effectiveness.
- Reviewing architecture and design on various aspects like scalability, security, design patterns, user experience, non-functional requirements and ensure that all relevant best practices are followed.
Key Skills required :
- 2-4 years of experience in data engineering roles.
- Advanced SQL skills with a focus on optimisation techniques
- Big data and Hadoop experience, with a focus on Spark, Hive (or other query engines), big data storage formats (such as Parquet, ORC, Avro).
- Cloud experience (GCP preferred) with solutions designed and implemented at production scale
- Strong understanding of key GCP services, especially those related to data processing [Batch/Real Time] Big Query, Cloud Scheduler, Airflow, Cloud Logging and Monitoring
- Hands-on experience with Git, advanced automation capabilities & shell scripting.
- Experience in design, development and implementation of data pipelines for Data Warehousing applications
- Hands on experience in performance tuning and debugging ETL jobs