Key Responsibilities:
Technical Leadership & Expertise:
- Define and lead all aspects of the overall data engineering architecture and roadmap using Databricks and modern data engineering practices.
- Mentor team members in solving technical challenges related to data pipelines and analytics through hands-on mentorship and support.
Databricks Implementation:
- Design and implement data pipelines using Databricks for ETL processes, leveraging Spark for large-scale data processing and transformation.
- Develop Delta Lake tables and optimize data storage and query performance within the Databricks environment.
- Create and maintain Databricks notebooks for data exploration, analytics, and machine learning model development, ensuring standard methodologies for documentation and collaboration.
Performance Optimization:
- Continuously monitor and optimize the performance of Databricks jobs, ensuring efficient execution of data pipelines.
- Analyze and fine-tune data processing workflows to reduce runtime and enhance performance.
- Develop strategies for optimizing Spark configurations and cluster resources for cost-effective and high-performance data processing.
- Platform Operations:
- Ensure the stability and reliability of the Databricks platform, managing cluster configuration, scaling, and resource allocation to meet workload demands.
- Fix and resolve operational issues within the Databricks environment, collaborating with DevOps teams for seamless integration and deployment.
Collaboration & Communication:
- Work closely with clinical programmers, medical reviewers, data managers central monitors, data scientist, and business customers to understand clinical data needs and collaborate on analytics projects.
- Facilitate communication between technical and non-technical teams to ensure alignment on data solutions and project goals.
Development & Problem-Solving:
- Effectively engage in the development of data pipelines and workflows within Databricks, supporting code reviews and technical discussions.
- Drive testing and deployment of data solutions, ensuring automated testing measures are in place for quality assurance.
- Scale proof-of-concept projects into production environments, ensuring performance, reliability, and maintainability.
Clinical Data Review & Monitoring :
- Leverage experience with clinical data management & clinical data review , including understanding of Clinical trial patient data ( such as CRF data , lab data , IVRS data ) , understanding of SDTM (Study Data Tabulation Model) and Clinical systems, to create efficient data workflows.
- Supervise and maintain clinical data review & monitoring system to ensure stability, performance, and availability of data solutions, proactively addressing system issues and bottlenecks.
- Assist in data migration strategies from legacy clinical systems, ensuring compliance with business rules and data governance standards.
- Handle technical debt and continuously seek opportunities for improvement in clinical data processes and architecture.
Qualifications:
Education:
- Bachelor’s degree or higher in Computer Science, Engineering, Mathematics, or a related field.
Experience and Skills:
Required:
- At least five(5) years of relevant IT experience, with a strong focus on data engineering and clinical technology.
- Demonstrable experience with Databricks and Spark in building data pipelines and analytics solutions.
- In-depth knowledge of clinical data management, including familiarity with clinical data regulations and systems.
- Proficiency in database technologies (e.g., SQL, NoSQL) and ETL tools.
- Solid understanding of data warehousing concepts and data governance practices related to clinical data.
- Familiarity with cloud services such as AWS, Azure, or GCP in relation to data solutions.
- Experience in implementing AI/ML techniques within data engineering workflows is advantageous.
- Strong SDLC foundations in Agile methodologies and experience in collaborative development environments.
- Experience with programming languages such as Python, SQL, or Scala with experience in code reviews.
- Excellent analytical and problem-solving skills, with a history of delivering data solutions in enterprise settings.
- Good interpersonal skills; ability to convey technical information clearly to team members.
Preferred:
- Knowledge of data visualization tools like Tableau or Power BI is a plus.
- Knowledge of AI implementations especially in area of Generative AI and Agentic AI