Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
AI Infrastructure Data Engineer Remore role Long Term Contract References must needed
Job Description:
Build the data backbone that powers AI
pipelines, knowledge bases, ingestion, and retrieval infrastructure. Minneapolis (Hybrid)
Intermediate / Senior
4-8 YOE
Data pipelines required AI systems are only as good as the data feeding them.
This role owns the infrastructure that gets data from internal systems, document stores, APIs, and enterprise databases into vector indexes, knowledge bases, and structured stores that AI agents can reliably query. You'll build ingestion pipelines with freshness management, design chunking and embedding strategies, and ensure retrieval quality
the hidden layer that determines whether agents give accurate answers or hallucinate.
This is not a traditional data warehousing role; it is data engineering specifically in service of AI systems.
WHAT YOU'LL BUILD
▸ Ingestion pipelines pulling from internal systems, APIs, document repositories, and enterprise databases into AI knowledge stores ▸ Vector indexing infrastructure
embedding model selection, chunking strategies, metadata enrichment, hybrid index design ▸ Freshness and change detection
incremental re-indexing, stale data detection, TTL management ▸
ETL / ELT
pipelines for structured data feeding AI decision and retrieval layers ▸ High-throughput event-driven ingestion for real
time and batch processing at enterprise scale ▸ Data quality validation
schema checks, completeness scoring, anomaly detection before indexing
REQUIRED EXPERIENCE
▸ 4+ years building production data pipelines
orchestrated workflows, not one-off scripts ▸ Strong SQL
query optimization, indexing, execution plans, large result sets ▸ Experience with vector databases or search infrastructure (OpenSearch, Pinecone, pgvector, Azure AI Search) ▸ Python data processing at scale
Pandas, Polars, or equivalent ▸ Understands embedding models
how to evaluate retrieval quality, why chunking strategy matters ▸ Cloud data stack
AWS (Glue, S3, RDS) or Azure equivalent ▸ Can diagnose why a RAG system's retrieval is failing