Job Responsibilities:
- Design, develop, and maintain scalable and robust data pipelines and ETL/ELT processes using Databricks, Spark (PySpark, Scala), Delta Lake, and related technologies.
- Architect and implement data lakehouse solutions on Databricks, ensuring data quality, integrity, and performance.
- Develop and optimize data models for analytical and reporting purposes within the Databricks environment.
- Implement and manage data governance and security best practices within the Databricks platform, including Unity Catalog and RBAC.
- Utilize Databricks Delta Live Tables (DLT) to build and manage reliable data pipelines.
- Implement and leverage Change Data Feed (CDF) for efficient data synchronization and updates.
- Monitor and troubleshoot data pipelines and system performance on the Databricks platform.
- Collaborate with data scientists and analysts to understand their data requirements and provide efficient data access and processing solutions.
- Participate in code reviews, ensuring adherence to coding standards and best practices.
- Contribute to the development of technical documentation and knowledge sharing within the team.
- Stay up-to-date with the latest advancements in Databricks and related data technologies.
- Mentor and guide junior engineers on the team.
- Participate in the planning and execution of data-related projects and initiatives.
Skills Required:
- Databricks, SQL, Pyspark, Python
- Data modeling, DE concepts