What You’ll Do
- Design and implement highly available data pipelines using spark and other big data technologies
- Work with data science team to develop new features to increase model accuracy and performance
- Create standardized data models to increase standardization across client deployments
- Troubleshooting and resolve issues in existing ETL pipelines.
- Complete proofs of concept to demonstrate capabilities and connect to new data sources
- Instill best practices for software development, ensure designs meet requirements, and deliver high-quality work on schedule.
- Document application changes and development updates.
What You’ll Bring
- A master’s or bachelor’s degree in computer science or related field from a top university.
- 4+ years' overall experience; 2+ years’ experience in data engineering using Apache Spark and SQL.
- 2+ years of experience in building and leading a strong data engineering team.
- Experience with full software lifecycle methodology, including coding standards, code reviews, source control management, build processes, testing, and operations.
- In-depth knowledge of python, sql, pyspark, distributed computing, analytical databases and other big data technologies.
- Strong knowledge of one or more cloud environments such as aws, gcp, and azure.
- Familiarity with the data science and machine learning development process
- Familiarity with orchestration tools such as Apache Airflow
- Strong analytical skills and the ability to develop processes and methodologies.
- Experience working with cross-functional teams, including UX, business (e.g. Marketing, Sales), product management and/or technology/IT/engineering) is a plus.
- Characteristics of a forward thinker and self-starter that thrives on new challenges and adapts quickly to learning new knowledge.