Responsibilities
- Analyze, organize and process raw data using Bigdata technologies and Scala
- Perform data validation, cleaning and transformation using Big data technologies, Spark and Scala
- Ingest and manage data in HDFS/Hive and analyze data using Spark and produce meaningful insights for the business team
- Combine raw information from different sources, Explore ways to enhance data quality and reliability
- Develop code using Software Craftmanship best practices such as Continuous delivery and Clean coding principles
- Help business operations team with data samples to create proof of concepts
- Ensure the new data sets are feeding the Business Intelligence setup i.e Dashboards on PowerBI
- Liaise with BA to ensure subjects are getting prioritized, Communicate to PO about progress and potential issues
Profile
- 2 to 3 years of expertise and hands on experience in Spark with Scala and Big data technologies
- Working proficiency with building data pipelines using Spark, PySpark, SparkSQL, and development tool : Git , GitLab
- Good working experience in Scala and object oriented concepts
- Good working experience in HDFS, Spark, Hive and Oozie
- Strong knowledge of design patterns (event driven architectures, kappa architectures…)
- Palantir Foundry data engineer certification is a plus
- Knowledge of data visualization technique is a plus
- Technical expertise with data models, data mining, and partitioning techniques
- Hands-on experience with SQL database
- Have good understand on CI/CD tools (Maven, Git, Jenkins) and SONAR
- Good to have knowledge on Kafka and ELK stack. Knowledge on Data visualization tool like PowerBI will be an added advantage.
- Strong communication and coordination skills with multiple stake holders. Assess existing situation, propose and follow up improvement’s actions
- Good team spirit, ability to work in international/intercultural environments (lots of interactions with onsite stake holders)
- Professional attitude: Self-motivated, fast learner, team player, independent