In this role, you will get to:
- Collaborate with cross-functional teams to understand business requirements and design data solutions to meet those needs
- Design and build scalable data pipelines to ingest, process, and store large volumes of data
- Manipulate complex data from a variety of sources (e.g. API, SFTP, Databases, SAP, Google Analytics, etc.)
- Ensure data quality, accuracy, and completeness through data validation and error handling processes
- Maintain and monitor existing ETL pipelines and advising on necessary infrastructure changes
- Design and implement security and access control measures to protect sensitive data or comply with regulation such as PDPA
- Share your expertise across the team and mentor the junior/mid-level via code reviews, 1:1, workshops or knowledge sharing sessions, to enhance their technical skills and understanding of best practices
- Participate in recruitment in order to evaluate and interview candidates, as well as improving our recruitment processes
- Gather customer requirements during pre-sales interactions, determining project scope, creating technical diagrams, and making estimations for man-days and cloud costs
- Develop and maintain technical documentation
You'll be successful if you have:
- 5+ years of experience in Data engineering in designing, building, maintaining data infrastructure in cloud environments such as AWS, GCP or Azure
- Strong programming skills in languages such as Python, Java, or Scala (Python preferred)
- Strong Experience with cloud based data lake solutions, such as S3, GCS or Data Lake Storage, and how to design and implement data lake architectures
- Strong Experience with data warehousing tools, such as Redshift, Synapse or BigQuery, and how to optimize data warehousing performance for large-scale data sets
- In-depth knowledge and hands-on experience with big data technologies such as Hadoop, Spark, and Hive (Spark preferred)
- Expertise in designing and implementing efficient ETL pipelines composed of a variety of data sources, and the ability to ensure data quality with a strong understanding of data integration and transformation techniques
- Experience working with orchestration tools such as Airflow, AWS Step functions or Azure Data Factory
- Ability to write efficient SQL queries for data extraction and transformation, demonstrating proficiency in optimizing query performance, understanding distributed query execution models, and utilizing advanced SQL concepts. (Spark preferred)
- Knowledge of CI/CD and Infrastructure as Code concepts, including experience with tools such as Terraform, AWS CDK, Github pipelines and Gitlab CI
- Excellent problem-solving skills and the ability to work well in a fast-paced, collaborative environment
- Ability to scope projects, define architectures, and choose technologies based on project requirements
- Leadership skills and ability to mentor junior and mid-level Data engineers
It's a plus if you have:
- Holding certifications related to data engineering of cloud providers such GCP Data engineering, etc
- Working on multiple projects simultaneously
- Data streaming technologies and data integration with streaming data sources
- Docker and experience deploying and managing containerized data solutions
- Implementing PDPA process
- Optimizing cloud costs through various strategies
- DevOps and continuous integration/continuous delivery (CI/CD) practices
- Gathering customer requirements and estimating project scope during pre-sales interactions
- Agile methodologies (e.g. Scrum, Kanban)