Roles and Responsibilities
- Design, develop, and maintain scalable data pipelines and systems using Apache Spark, Databricks, and SQL.
- Collaborate with data scientists, analysts, and other stakeholders to understand requirements and deliver high-quality data solutions.
- Optimize and tune data processing workflows for performance and scalability.
- Implement data quality checks and ensure data integrity across various data sources.
- Develop and maintain ETL processes to ingest and transform data from multiple sources.
- Monitor and troubleshoot data pipeline issues, ensuring timely resolution.
- Stay up-to-date with the latest industry trends and technologies in data engineering.
Skills and Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 5+ years of experience in data engineering or a related role.
- Strong proficiency in Apache Spark and Databricks.
- Advanced SQL skills, including experience with complex queries and performance optimization.
- Experience with cloud platforms such as AWS, Azure, or Google Cloud.
- Proficiency in programming languages such as Python or Scala.
- Familiarity with data warehousing concepts and technologies (e.g., Snowflake, Redshift).
- Excellent problem-solving skills and attention to detail.
- Strong communication and collaboration skills.
Preferred Qualifications
- Experience with real-time data processing and streaming technologies (e.g., Kafka, Flink).
- Knowledge of data governance and security best practices.
- Experience with CI/CD pipelines and DevOps practices.
- Familiarity with WordPress and advanced HTML/CSS for implementing website changes.
- Ability to work in a collaborative environment and handle multiple projects.
- Strong attention to detail and a passion for creating user-friendly websites.