PavagoData

Data Engineer

RemotePosted 25 days ago

Our client is seeking a Data Engineer to design, build, and maintain reliable data pipelines and infrastructure that deliver clean, accessible, and actionable data. This role requires strong software engineering fundamentals, experience with modern data stacks, and an eye for quality and scalability. The Data Engineer ensures data flows seamlessly from source systems to warehouses and BI tools, powering decision-making across the business.

Location: Remote

Responsibilities

Build and maintain ETL/ELT pipelines using Python, SQL, or Scala.
Orchestrate workflows with Airflow, Prefect, Dagster, or Luigi.
Ingest structured and unstructured data from APIs, SaaS platforms, relational databases, and streaming sources.
Manage data warehouses (Snowflake, BigQuery, Redshift).
Design schemas (star/snowflake) optimized for analytics.
Implement partitioning, clustering, and query performance tuning.
Implement validation checks, anomaly detection, and logging for data integrity.
Enforce naming conventions, lineage tracking, and documentation (dbt, Great Expectations).
Maintain compliance with GDPR, HIPAA, or industry-specific regulations.
Develop and monitor streaming pipelines with Kafka, Kinesis, or Pub/Sub.
Ensure low-latency ingestion for time-sensitive use cases.
Partner with analysts and data scientists to provide curated, reliable datasets.
Support BI teams in building dashboards (Tableau, Looker, Power BI).
Document data models and pipelines for knowledge transfer.
Containerize data services with Docker and orchestrate in Kubernetes.
Automate deployments via CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI).
Manage cloud infrastructure using Terraform or CloudFormation.

Requirements

3+ years in data engineering or back-end development.
Strong Python and SQL skills.
Experience with at least one major data warehouse (Snowflake, Redshift, BigQuery).
Familiarity with pipeline orchestration tools (Airflow, Prefect).

Additional Information

A typical day involves maintaining pipelines, ingesting data sources, optimizing queries, collaborating with teams, and ensuring data quality and reliability. Key metrics include pipeline uptime ≥ 99%, data freshness, zero critical errors, cost optimization, and positive feedback from data consumers.

Apply Now

Location

Remote