What you will do:
Python Development: Write clean, efficient, and maintainable Python code to support data engineering tasks, including data collection, transformation, and integration with machine learning models.
Data Pipeline Development: Design, develop, and maintain robust data pipelines that efficiently gather, process, and transform data from various sources into a format suitable for machine learning and data science tasks using ELK stack, Python and other leading technologies.
Spark Knowledge: Apply basic Spark concepts for distributed data processing when necessary, optimizing data workflows for performance and scalability.
ELK Integration: Utilize ElasticSearch, Logstash, and Kibana (ELK) for data management, data indexing, and real-time data visualization. Knowledge of OpenSearch and related stack would be beneficial.
Grafana and Kibana: Create and manage dashboards and visualizations using Grafana and Kibana to provide real-time insights into data and system performance.
Kubernetes Deployment: Deploy data engineering solutions and machine learning models to a Kubernetes-based environment, ensuring security, scalability, reliability, and high availability.
What you will Bring:
Machine Learning Model Development: Collaborate with data scientists to develop and implement machine learning models, ensuring they meet performance and accuracy requirements.
Model Deployment and Monitoring: Deploy machine learning models and implement monitoring solutions to track model performance, drift, and health.
Data Quality and Governance: Implement data quality checks and data governance practices to ensure data accuracy, consistency, and compliance with data privacy regulations.
MLOps (Added Advantage): Contribute to the implementation of MLOps practices, including model deployment, monitoring, and automation of machine learning workflows.
Documentation: Maintain clear and comprehensive documentation for data engineering processes, ELK configurations, machine learning models, visualizations, and deployments.