Essential Functions:
- Proficient in exploratory data analysis (EDA) using Python's scientific libraries including numpy, pandas, matplotlib, seaborn, and scikit-learn
- Exposure to model development frameworks like MLFlow
- Experience using Papermill for parameterizing and executing Jupyter Notebooks
- Strong development experience in at least one of the following: Python, R (preferably Python)
- Implementation of MLOps practices including continuous integration and deployment (CI/CD) for ML models
- Hands on experience in building and maintaining data pipelines, feature engineering pipelines and comfortable with core ML concepts.
- Hands-on experience in engineering, testing, validating, and productizing ML models for high-performance use cases
- Hands-on experience with AWS Sagemaker for building, training, and deploying ML models
- Develop and implement practices for deploying machine learning models in large data science projects
- Proven experience in building and training complex ML models
- Experience using and maintaining DevOps tools and implementing automations for production
- Additional knowledge of AWS services and ecosystems
- Experience working with containerized and virtualized environments (Docker, K8s)
This is a hybrid position. Expectation of days in office will be confirmed by your Hiring Manager.
Qualifications
6+ yrs. work experience with a Bachelor’s Degree or 5+ years of work experience with a Master's or Advanced Degree in an analytical field such as computer science, statistics, finance, economics or relevant area.
Technical skills:
- Proficient in exploratory data analysis (EDA) using Python's scientific libraries including numpy, pandas, matplotlib, seaborn, and scikit-learn
- Exposure to frameworks like MLflow for model lifecycle management
- Strong development experience in programming languages, preferably Python
- Implementation of MLOps practices, including continuous integration and deployment (CI/CD) for ML models.
- Experience with complex, high-volume, multi-dimensional data, as well as machine learning models based on unstructured, structured, and streaming datasets.
- Experience with Unix/Shell or Python scripting and exposure to scheduling tools like Oozie and Airflow.
- Hands-on experience with AWS SageMaker for building, training, and deploying ML models.
- Proven experience in building and training complex ML models.
- Experience using Papermill for parameterizing and executing Jupyter Notebooks.
- Experience with SQL for extracting, aggregating, and processing big data pipelines using Hadoop, EMR, and NoSQL Databases.
Additional Skills (Plus):
- Exposure to model serving engines such as Tensorflow, Triton etc.
- Spark Pipelines: Build and maintain efficient and robust Spark pipelines to create and access data sets and feature stores for ML models.
- ETL processes: The role also involves developing and executing large scale ETL processes to support data quality, reporting, data marts, and predictive modeling.
- Knowledge of standard Big data and Real Time stack such as Hadoop, Spark, Kafka, Redis, Flink and similar technologies