Responsibilities :
- Experience on developing, maintaining and supporting big data ETL pipelines (Hadoop, Hive, Spark)
- Responsible for the implementation and test of scalable distributed ensuring security, timeliness and quality of data.
- Experience supporting production systems
- Work closely with other engineers and peers to design and develop ETL pipelines
- Work with product managers in understanding and clarifying on the requirements
- Demonstrate inclination towards continuous learning and professional development.
- Proven track record of quickly acquiring new skills and knowledge in a fast-paced environment.
This is a hybrid position. Expectation of days in office will be confirmed by your Hiring Manager.
Qualifications
Basic Qualifications
•2+ years of relevant work experience and a Bachelors degree, OR 5+ years of relevant work experience
Preferred Qualifications
•3 or more years of work experience with a Bachelor’s Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters, MBA, JD, MD)
•Bachelors in Computer Science, or Information Systems or related field from one of the top institutes
•5+ years of experience in data technologies and applications development
•4+ years of experience in Hadoop using Core Java Programming, Scala, Spark, Kafka , Hive on Linux/Unix environment.
•Working knowledge of Hadoop ecosystem and associated technologies, (For e.g. Apache Spark, etc.)
•Experience in writing Spark and Hive code to process large data sets in Hadoop environments
•Strong experience with SQL for extracting, aggregating, and processing big data using Hadoop
•Development experience in one or more of the following: Scala, Python or Java.
•Basic experience with Unix/Shell or Python scripting
•Exposure to Scheduling tools like Airflow and Control – M will be a plus
•Experience using VCS like Git
•Basic understanding of RDBMs viz, MS SQL, DB2, Oracle, etc for data retrieval
•Understanding of Kafka, Apache Hudi is a plus.