IB

Data Engineer-Data Platforms

IBM

10 months ago

5 - 7 years

Hybrid

Bengaluru, Karnataka, Karnataka, India

  • Designing and developing data pipelines to migrate workloads from IIAS to Cloudera Data Lake.
  • Implementing streaming and batch data ingestion frameworks using Kafka, Apache Spark (PySpark).
  • Implementing Apache Iceberg tables for efficient data storage and retrieval.
  • Apache SPARK

    PySpark

    Kafka

    SQL

    Cloudera

    Python/Scala

    Data pipelines

    Data Security

    Cloud Platforms

    CI/CD pipeline

    Big data

    Job description & requirements

    Your role and responsibilities


    A Data Engineer specializing in enterprise data platforms, experienced in building, managing, and optimizing data pipelines for large-scale environments. Having expertise in big data technologies, distributed computing, data ingestion, and transformation frameworks.

    Proficient in Apache Spark, PySpark, Kafka, and Iceberg tables, and understand how to design and implement scalable, high-performance data processing solutions.What you’ll do: As a Data Engineer – Data Platform Services, responsibilities include:

    Data Ingestion & Processing

    • Designing and developing data pipelines to migrate workloads from IIAS to Cloudera Data Lake.
    • Implementing streaming and batch data ingestion frameworks using Kafka, Apache Spark (PySpark).
    • Working with IBM CDC and Universal Data Mover to manage data replication and movement.
    • Big Data & Data Lakehouse Management
    • Implementing Apache Iceberg tables for efficient data storage and retrieval.
    • Managing distributed data processing with Cloudera Data Platform (CDP).
    • Ensuring data lineage, cataloging, and governance for compliance with Bank/regulatory policies.
    • Optimization & Performance Tuning
    • Optimizing Spark and PySpark jobs for performance and scalability.
    • Implementing data partitioning, indexing, and caching to enhance query performance.
    • Monitoring and troubleshooting pipeline failures and performance bottlenecks.
    • Security & Compliance
    • Ensuring secure data access, encryption, and masking using Thales CipherTrust.
    • Implementing role-based access controls (RBAC) and data governance policies.
    • Supporting metadata management and data quality initiatives.
    • Collaboration & Automation
    • Working closely with Data Scientists, Analysts, and DevOps teams to integrate data solutions.
    • Automating data workflows using Airflow and implementing CI/CD pipelines with GitLab and Sonatype Nexus.
    • Supporting Denodo-based data virtualization for seamless data access


    Required education

    Bachelor's Degree


    Preferred education

    Master's Degree


    Required technical and professional expertise

    • 4-7 years of experience in big data engineering, data integration, and distributed computing.
    • Strong skills in Apache Spark, PySpark, Kafka, SQL, and Cloudera Data Platform (CDP).
    • Proficiency in Python or Scala for data processing.
    • Experience with data pipeline orchestration tools (Apache Airflow, Stonebranch UDM).
    • Understanding of data security, encryption, and compliance frameworks


    Preferred technical and professional experience

    • Experience in banking or financial services data platforms.
    • Exposure to Denodo for data virtualization and DGraph for graph-based insights.
    • Familiarity with cloud data platforms (AWS, Azure, GCP).
    • Certifications in Cloudera Data Engineering, IBM Data Engineering, or AWS Data Analytics


    Experience :

    5 - 7 years

    Job Domain/Function :

    Data Engineering

    Job Type :

    Hybrid

    Employment Type :

    Full Time

    Number Of Position(s) :

    1

    Educational Qualifications :

    Bachelor's Degree

    Location :

    Bengaluru, Karnataka, India, Bengaluru, Karnataka, India

    Create alert for similar jobs

    Similar Jobs