Responsibilities:
● Design, develop, test, and maintain highly scalable and performant data infrastructure (Analytics Architecture).
● Build and optimize complex, high-performance integrated ETL/ELT pipelines for a unified customer view (Integrated Customer View).
● Establish and enforce standardized metadata management, access protocols, and data discovery mechanisms (Information Layer).
● Implement robust data quality checks and monitoring systems across the data lifecycle (Instrumentation layer).
● Evaluate, prototype, and integrate cutting-edge big data technologies and software engineering tools into existing and new infrastructures.
● Proactively identify and research opportunities for novel data acquisition and innovative uses of existing data.
● Lead collaboration efforts with Data and IT teams to enhance the data platform's capabilities and reliability.
● Develop and implement comprehensive data collection, storage, and retrieval monitoring processes.
● Drive collaboration with Engineering teams to establish and maintain data standards and best practices.
● Develop a deep understanding of the business and its various domains, translating business requirements into technical data solutions.
● Actively contribute to the open-source community and share knowledge with the team.
● Effectively lead and mentor junior team members, fostering a collaborative and high-performing environment.
Requirements:
● Extensive hands-on experience with programming languages: Python, Java, Scala.
● Deep expertise in big data stacks, including:
● Distributed systems: Spark (PySpark), Hadoop, Presto, Hive.
● Message Queueing systems: Kafka, RabbitMQ, NSQ.
● Databases (Relational & NoSQL): PostgreSQL, MySQL, MongoDB.
● Proven experience gathering and analyzing complex system requirements.
● In-depth understanding of advanced database structure principles, data warehousing methodologies, data mining concepts, and segmentation techniques.
● Significant experience with cloud computing platforms (AWS, GCP) and UNIX environments.
● Extensive experience with AWS services such as EMR, Lambda, Step Functions, S3, and Redshift.
● Experience architecting, implementing, and monitoring large-scale big data analytics solutions.
● Strong analytical and problem-solving skills with a natural curiosity about big data and emerging technologies.
● Strong DevOps/DataOps skills and experience with CI/CD pipelines for data infrastructure.
● Background: Bachelor's or Master's degree in Computer Science (preferred) or a related field.