Skip to main content
Tallo logoTallo logo
Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

AI Infrastructure Data Engineer || Remore role

Job

Verito Solutions

Remote

Full-Time

Posted 3 days ago (Updated 10 hours ago) • Actively hiring

Expires 7/4/2026

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
100
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

AI Infrastructure Data Engineer Remore role Long Term Contract References must needed
Job Description:
Build the data backbone that powers AI
  • pipelines, knowledge bases, ingestion, and retrieval infrastructure. Minneapolis (Hybrid)
  • Intermediate / Senior
  • 4-8 YOE
  • Data pipelines required AI systems are only as good as the data feeding them.
This role owns the infrastructure that gets data from internal systems, document stores, APIs, and enterprise databases into vector indexes, knowledge bases, and structured stores that AI agents can reliably query. You'll build ingestion pipelines with freshness management, design chunking and embedding strategies, and ensure retrieval quality
  • the hidden layer that determines whether agents give accurate answers or hallucinate.
This is not a traditional data warehousing role; it is data engineering specifically in service of AI systems.
WHAT YOU'LL BUILD
▸ Ingestion pipelines pulling from internal systems, APIs, document repositories, and enterprise databases into AI knowledge stores ▸ Vector indexing infrastructure
  • embedding model selection, chunking strategies, metadata enrichment, hybrid index design ▸ Freshness and change detection
  • incremental re-indexing, stale data detection, TTL management ▸
ETL / ELT
pipelines for structured data feeding AI decision and retrieval layers ▸ High-throughput event-driven ingestion for real
  • time and batch processing at enterprise scale ▸ Data quality validation
  • schema checks, completeness scoring, anomaly detection before indexing
REQUIRED EXPERIENCE
▸ 4+ years building production data pipelines
  • orchestrated workflows, not one-off scripts ▸ Strong SQL
  • query optimization, indexing, execution plans, large result sets ▸ Experience with vector databases or search infrastructure (OpenSearch, Pinecone, pgvector, Azure AI Search) ▸ Python data processing at scale
  • Pandas, Polars, or equivalent ▸ Understands embedding models
  • how to evaluate retrieval quality, why chunking strategy matters ▸ Cloud data stack
  • AWS (Glue, S3, RDS) or Azure equivalent ▸ Can diagnose why a RAG system's retrieval is failing
  • at the data layer
NICE TO HAVE
▸ Event streaming platforms
  • event-driven pipeline design, high-throughput ingestion patterns ▸ Legacy enterprise RDBMS experience (DB2, Oracle, or equivalent) ▸ Document intelligence
  • OCR pipelines, PDF/scanned document ingestion ▸ dbt, Airflow, or similar pipeline orchestration tooling ▸ Knowledge graph experience
  • Neo4j, Amazon Neptune, RDF/SPARQL, ontology design ▸ Experience building knowledge bases specifically for LLM consumption
  • not just generic warehousing ▸ Financial services data
  • understanding of regulated data handling, PII, audit trails
TECH STACK
Python
  • Pandas / Polars / PySpark
ETL / ELT
Pipelines Event Streaming Pipelines Vector Databases (pgvector
  • Pinecone
  • Weaviate) OpenSearch
  • Hybrid Search Knowledge Graphs
  • Graph Databases Neo4j
  • Amazon Neptune RDF
  • SPARQL
  • Ontology Design Embedding Models
  • Chunking Strategies Document Intelligence
  • OCR Pipelines dbt
  • Airflow
  • Pipeline Orchestration Cloud Data Services (AWS / Azure) Relational Databases
  • SQL Optimization Data Quality
  • Schema Validation Docker
  • Container Orchestration Enterprise API Integration
  • ( Believe you can and you re halfway there. )
  • Theodore Roosevelt Sayantan Das |
Senior Tech Recruiter E:
P:
+1 |