Big Data Engineer (Lead)
Job Description:
· Build pipelines to bring in wide variety of data from multiple sources within the organization as well as from social media and public data sources.
· Collaborate with cross functional teams to source data and make it available for downstream consumption.
· Work with the team to provide an effective solution design to meet business needs.
· Ensure regular communication with key stakeholders, understand any key concerns in how the initiative is being delivered or any risks/issues that have either not yet been identified or are not being progressed.
· Ensure dependencies and challenges (risks) are escalated and managed. Escalate critical issues to the Sponsor and/or Head of Data Engineering.
· Ensure timelines (milestones, decisions and delivery) are managed and value of initiative is achieved, without compromising quality and within budget.
· Ensure an appropriate and coordinated communications plan is in place for initiative execution and delivery, both internal and external.
· Ensure final handover of initiative to business-as-usual processes, carry out a post implementation review (as necessary) to ensure initiative objectives have been delivered, and any lessons learned are fed into future initiative management processes.
Who we are looking for:
Competencies & Personal Traits
· Work as a team player
· Excellent problem analysis skills
. Good in the Azure Databricks platform
· Experience with at least one Cloud Infra provider (Azure/AWS)
· Experience in building data pipelines using batch processing with Apache Spark (Spark SQL, Dataframe API) or Hive query language (HQL)
· Experience in building streaming data pipeline using Apache Spark Structured Streaming or Apache Flink on Kafka & Delta Lake
· Knowledge of NOSQL databases. Good to have experience in Cosmos DB, Restful API’s and GraphQL
· Knowledge of Big data ETL processing tools, Data modelling and Data mapping.
· Experience with Hive and Hadoop file formats (Avro / Parquet / ORC)
· Basic knowledge of scripting (shell / bash)
· Experience of working with multiple data sources including relational databases (SQL Server / Oracle / DB2 / Netezza), NoSQL / document databases, flat files
· Basic understanding of CI CD tools such as Jenkins, JIRA, Bitbucket, Artifactory, Bamboo and Azure Dev-ops.
· Basic understanding of DevOps practices using Git version control
· Ability to debug, fine tune and optimize large scale data processing jobs
Working Experience
· 12-15 years of broad experience of working with Enterprise IT applications in cloud platform and big data environments.
Professional Qualifications
· Certifications related to Data and Analytics would be an added advantage
Education
· Master/bachelor’s degree in STEM (Science, Technology, Engineering, Mathematics)
Language
· Fluency in written and spoken English