What you’ll be doing...
- Lead the design and implementation of complex, scalable data architectures
- Mentor junior engineers and lead code reviews, best practices and documentation.
- Ensure high data quality, accuracy, and integrity across all systems.
- Work with structured and unstructured data from multiple sources.
- Optimize data workflows for performance, reliability, and cost efficiency.
- Collaborate with analysts, and data scientists to meet data needs
- Monitor, troubleshoot, and improve existing data systems and jobs
- Apply best practices in data governance, security and compliance.
- Use tools like Spark, Kafka, Airflow, SQL, Python and cloud platforms
- Stay ahead with evolving technologies and guide strategic data initiatives.
What we’re looking for…
You will need to have:
- Bachelor's degree or four or more years of work experience.
- Expertise in AWS Data Stack – Strong hands-on experience with S3, Glue, EMR, Lambda, Kinesis, Redshift, Athena, and IAM security best practices.
- Big Data & Distributed Computing – Deep understanding of Apache Spark (batch and streaming) large-scale data processing and analytics.
- Real-Time & Batch Data Processing – Proven experience designing, implementing, and optimizing event-driven and streaming data pipelines using Kafka and Kinesis.
- ETL/ELT & Data Modeling – Strong experience in architecting and optimizing scalable ETL/ELT pipelines for structured and unstructured data.
- Programming Skills – Proficiency in Scala and Java for data processing and automation.
- Database & SQL Optimization – Strong understanding of SQL and experience with relational (PostgreSQL, MySQL). Expertise in SQL query tuning, data warehousing and working with Parquet, Avro, ORC formats.
- Infrastructure as Code (IaC) & DevOps – Experience with CloudFormation, CDK, and CI/CD pipelines for automated deployments in AWS.
- Monitoring, Logging & Observability – Familiarity with AWS CloudWatch, Prometheus, or similar monitoring tools.
- API Integration – Ability to fetch and process data from external APIs and databases.
- Architecture & Scalability Mindset – Ability to design and optimize data architectures for high-volume, high-velocity, and high-variety datasets.
- Performance Optimization – Experience in optimizing data pipelines for cost and performance.
Even better if you have one or more of the following:
- Cross-Team Collaboration – Work closely with Data Scientists, Analysts, DevOps, and Business Teams to deliver end-to-end data solutions.
- Agile & CI/CD Practices – Comfortable working in Agile/Scrum environments, driving continuous integration and continuous deployment.