Primary Responsibilities
- Design: Lead the data pipeline designs as per the defined architecture of the metric repository, using/building re-usable functions and frameworks ensuring scalability, reliability and performance.
- Data Analysis: Understand the metric definitions, perform the analysis and lead the discussions with the requestors from across the regions to define the calculation logic and mapping from the source data assets
- Data Pipeline Development: Lead a team of data engineers to develop end to end data pipelines to create the new metrics of the metric repository and re-engineer scripts created by the data science teams, such that they can be easily plugged into existing ML Ops frameworks.
- AI/ML: Design and develop the integration of the AI/ML model outputs into the metric repository, build the ML Ops components to support and manage the lifecycle of AI based models built by the regional and global data science teams.
- Data Quality and Governance: Ensure and enforce the defined data quality standards around data accuracy, integrity and consistency across all deliverables and client deliveries.
- Code Review and Best Practices: Conduct code reviews and ensure adherence to best practices in development and deployment. Define and build technical documentation and adherence to the CI/CD processes.
- Operations: Support the generation and delivery of Insight feeds to clients, on-time and with the required data quality checks.
- Team Management & Collaboration: Collaborate with Data engineers, Data scientists and various groups within the organization to identify areas of improvement, bottlenecks and re-use existing frameworks/processes.
Technical skills
- Experience in building large globally applicable data platforms using different data modelling, data storage and data flow techniques to support the technical and business use cases and a solid understanding of best practices in data engineering
- Experience with machine learning model inference, validation, deployment and management of BAU operations.
- Strong programming skills in building data pipelines using PySpark, Hive, Airflow.
- Experience working with scheduling tools (Airflow, Oozie) or building data processing orchestration workflows.
- Hands-on experience working with large scale data ingestion, processing, and storage in the Hadoop ecosystem
- Experience in writing and optimizing SQL queries in Big data environment.
- Experience in creating data dictionaries, setup and monitor data validation alerts, and execute periodic jobs to maintain data pipelines for completed projects
- Experience working in Linux/Unix environment and exposure to command line utilities.
- Experience creating/supporting production software/systems and a proven track record of identifying and resolving performance bottlenecks for production systems.
- Experience working in building and integrating the code in the defined CI/CD framework using git.
- Experience in drafting solution architecture frameworks that rely on API’s and micro-services
Strategic and Functional Excellence
- Good business acumen to orient data analysis to business needs of clients, including experience in the payments space.
- Ability to translate data and technical concepts into requirements documents, business cases and user stories.
- Good understanding of agile working practices and related program management skills.
- Should have strong problem-solving capabilities and ability to quickly propose feasible solutions and effectively communicate strategy and risk mitigation approaches to leadership.
- Excellent communication and presentation skills with ability to interact with different cross-functional team members at varying levels
- Ability to learn new tools and paradigms as data science continues to evolve at Visa and elsewhere.