Job title: Real-World Evidence Data Scientist
About the job
Our Team:
Sanofi Business Operations is an internal Sanofi resource organization based in India and is setup to centralize processes and activities to support Specialty Care, Vaccines, General Medicines, CHC, CMO, and R&D, Data & Digital functions. Sanofi Business Operations strives to be a strategic and functional partner for tactical deliveries to Medical, HEVA, and Commercial organizations in Sanofi, Globally.
Main responsibilities:
The overall purpose and main responsibilities are listed below:
Provide a high level of expertise in employing cutting-edge analytical & computational approaches to drive evidence-based pharmaceutical product development; provide scientific and technical leadership in machine learning and AI; work closely with other disciplines across Sanofi including Business Units, Digital, R&D, Biostatistics, Information Technology Systems and other Data Science partners to deliver cutting edge analysis to key business questions. Examples of Advanced Analytics activities: (1) Machine/Deep Learning to elucidate disease trajectories, patient subtypes, define underdiagnosed conditions, and unmet health needs; (2) Create a framework for generating re-usable models and insights across big-data (e.g. EHRs, claims) and rich small data sets (e.g. clinical trials, imaging); (3) Generating insights by merging diverse data streams e.g. health, surveillance, trend data, sensor, imaging; (4) Adoption of emerging technology into an analytical framework: distributed analytics, graph databases
- People: (1) Act as a subject matter expert in machine learning, statistical and/or modelling working on team projects; (2) Work with internal and external study lead to execute Advance Analytics projects and studies.
- Performance: (1) Implement and execute computational and statistical methodologies in Advanced Analytics for RWE; (2) Provide expertise and execute advanced analytics for solving problems across R&D, Medical Affairs, HEVA and Market Access Strategies and Plans
- Process: (1) Apply a broad array of capabilities spanning machine learning, statistics, mathematics, modelling, simulation, text-mining/NLP, data-mining to extract insights and be able to communicate and champion these efforts across the company; (2) Plan and deploy methodological standards, standardized processes, demos, and POCs for the company’s highest priority business needs; (3) Contribute to the design, development, and implementation of Sanofi’s data science architecture and ecosystem to guide decision-making and building foundational capabilities
About you
- Experience: Around 10 years’ experience; High level proficiency in at least two or more technical or analytical languages (R, Python, etc..); experience with advanced ML techniques (neural networks/deep learning, reinforcement learning, SVM, PCA, etc.); ability to interact with a variety of large-scale data structures e.g., HDFS, SQL, NoSQL; Experience working across multiple environments (e.g. AWS, GCP, Linux) for optimizing compute and big data handling requirements; Experience with any of the following: biomedical data types/population health data/real world data/novel data streams relevant to the pharmaceutical industry; Experience with big data analytics platforms or high-level ML libraries such as H2O, SageMaker, Databricks, Keras, pyTorch, TensorFlow, Theano, DSSTNE or similar; Ability to prototype analyses and algorithms in high-level languages embracing reproducible and collaborative technology platforms (e.g. GitHub, containers, jupyter notebooks); Exposure to NLP technologies and analyses; Knowledge of some datavis technologies (ggplot2, shiny, plotly, d3, Tableau or Spotfire); Experience with probabilistic and/or functional programming languages such as Stan, Edward, Scala; Experience with advanced ML techniques (RNN, CNN, LSTM, GRU, Genetic Algorithms, Reinforcement Learning, etc.)
- Real-World Data (RWD): Experience with Real-World Data (RWD), demonstrated proficiency in working with diverse real-world data sources, including but not limited to: MarketScan, CPRD, TriNetX and STATinMED.
- Education: PhD in quantitative field such as Statistics, Biostatistics, Applied Mathematics or related field with 6 years of industry or academic experience; Relevant Master’s Degree, with 10 years of related industry or academic experience.
- Soft skills: Strong oral and written communication skills; ability to work and collaborate in a team environment
- Languages: Excellent knowledge of English language (spoken and written)