workfromanywhereworkfromanywhere
All jobs
HyredData

Data Engineer (Data Pipelines & RAG)

RemotePosted 2 days ago

A versatile Data & AI Engineer role at a fast-growing Property Tech AI company, focusing on building and maintaining data pipelines for Gen AI applications, with responsibilities spanning data modeling, AI integration, observability, and automation.

Location: Remote

Responsibilities

  • Automate data ingestion from diverse sources including unstructured documents, tables, charts, and drawings.
  • Own chunking strategy, embedding, indexing of data for retrieval by RAG/agent systems.
  • Build, test, and maintain robust ETL/ELT workflows using Spark (batch & streaming).
  • Define and implement logical/physical data models and schemas, develop schema mapping and data dictionaries.
  • Instrument data pipelines to surface real-time context into LLM prompts.
  • Implement prompt engineering and RAG for workflows within the RE/Construction industry vertical.
  • Implement monitoring, alerting, and logging for data quality, latency, and errors.
  • Apply access controls and data privacy safeguards (e.g., Unity Catalog, IAM).
  • Develop automated testing, versioning, and deployment using Azure DevOps, GitHub Actions, Prefect/Airflow.
  • Maintain reproducible environments with infrastructure as code (Terraform, ARM templates).

Requirements

  • 5 years in Data Engineering or similar role, with 12-24 months experience in building pipelines for unstructured data extraction, document processing with OCR, cloud-native solutions, chunking, indexing for RAG/Gen AI applications.
  • Proficiency in Python, dlt for ETL/ELT pipelines, duckDB or equivalent tools, dvc for large file management.
  • Solid SQL skills and experience with relational databases; familiarity with non-relational column-based databases.
  • Familiarity with Prefect or similar tools (Azure Data Factory).
  • Proficiency with Azure ecosystem and services in production.
  • Familiarity with RAG indexing, chunking, and storage across file types.
  • Strong DevOps and CI/CD experience (CircleCI / Azure DevOps).
  • Experience deploying ML artifacts using MLflow, Docker, or Kubernetes.

Benefits

  • Fast-growing, revenue-generating proptech startup.
  • Flat, no BS environment with high autonomy.
  • Steep learning opportunities in enterprise production use-cases.
  • Remote work with quarterly meet-ups.
  • Exposure to multi-market, multi-cultural clients.

Additional Information

  • Early-stage startup environment requiring wearing many hats, working outside comfort zone, with direct impact in production.

Location

Remote

Category

Data

Company

Hyred

Source

himalayas

Posted

2 days ago

Share this job

XLinkedIn

Similar remote jobs

today

Child & Community Impact Narrative & Data Specialist

Remote (Global)
today
IQVIANewData

Clinical Research Associate 2 - IQVIA Biotech

Spain
today

Business Analytics Senior Manager Payment Integrity - Risk & Affordability

United States$130,900–$218,100 USD/year, plus potential bonus
today