Data Engineer / ML Engineer

Data Engineer / ML Engineer

Arbetsbeskrivning

We are looking for a Machine Learning Engineer | Data Engineer who works alongside Data Scientists and Data Analytics in the team to develop scalable and production ready Advanced Analytics and AI software and products. Additionally, they develop different technical tools/services to enable large scale machine learning solutions.

A Data Engineer / ML Engineer believes in a non-hierarchical culture of collaboration, transparency, safety, and trust. Working with a focus on value creation, growth and serving customers with full ownership and accountability. Delivering exceptional customer and business results.

Responsibilities

- Design, develop and build real-time / batch data pipelines from a variety of sources (streaming data, APIs, data warehouse, messages etc.)

- Leverage the understanding of software architecture and software design patterns to write scalable, maintainable, well-designed and future-proof software

- Manage existing pipelines and create new pipelines from a variety of sources (relational, XML, etc.)

- Actively apply best practices within CI/CD

- Propose and implement solutions for data pipeline stabilization and data quality checks

- Coordination with other teams to design optimal patterns for data ingest and egress, as well as lead and coordinate data quality initiatives and troubleshooting

- Design and build solutions to track data quality, stabilize data pipeline, etc. to ensure reliable operations

- Ensure best practices are followed across architecture, codebase and configuration

- Eliminate waste

- Deliver on time

Competences

- Ability to establish with clear goals and responsibilities to achieve a high level of performance.

- Ability to evaluate different options proactively and ability to solve problems in an innovative way. Develop new solutions or combine existing methods to create new approaches.

- Comfortable in working with external product teams to establish the optimal data integration patterns/solutions

Functional Knowledge

Azure based requirements:

- Familiar with Azure storage account, Databricks, AD group, Key vault

- Familiar with Azure DevOps pipeline, yaml configuration.

- Familiar with Spark, know how to configure, customize spark, write pyspark code

- Understand Mlflow, DBFS in Databricks

GCP requirements:

- Familiar with BigQuery, can code SQL

- Familiar with Cloud composer / airflow

- Familiar with IAM, service account

- Familiar with Data catalog

- Understand Infrastructure as Code

- Good to have knowledge with Dataflow, K8s, Vertex AI pipeline, Kubeflow pipeline

Cloud agnostic skills:

- Python:

- Deep knowledge about python programing, practice OOP, following coding best practice, know how to use flake8, mypy, black, SonarQube and pre-commit

- Deep knowledge in unit test and end to end test, familiar with Pytest, fixtures, unittest etc

- Unix:

- Familiar with popular Unix system, know how to install sth in docker.

- Familiar with shell

- Git

- Know how to create PR and solve merge conflict.

- Can create CI/CD pipeline in either Github Action or Azure DevOps using best practice

- Docker

- Deep understanding with Docker

- DBT

- Deep Knowledge in DBT, preferably with GCP

- SQL

- Deep knowledge of SQL

- Deep understanding with Data modeling, system design

Required cloud certification: GCP ML Engineer or GCP Data Engineer (To be obtained at latest 1 month before start date)

Sammanfattning

  • Arbetsplats: weITglobal
  • 1 plats
  • 6 månader eller längre
  • Heltid
  • Fast månads- vecko- eller timlön
  • Publicerat: 2 maj 2023
  • Ansök senast: 3 maj 2023

Liknande jobb


Data Scientist

Data Scientist

23 oktober 2024

22 oktober 2024

22 oktober 2024

Sales & Marketing Analyst

Sales & Marketing Analyst

22 oktober 2024