Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Apply Offsite

AWS Cloud Data Engineer

Job

Xoriant Corporation

Boston, MA (In Person)

Full-Time

Posted 3 days ago (Updated 15 hours ago) • Actively hiring

Expires 7/4/2026

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Role:

AWS Cloud Data Engineer Location:

Boston, MA (Hybrid Onsite)

Industry:

Financial Services About the Role We are seeking a highly skilled Cloud Data Engineer to design, build, and optimize a modern, scalable Legal Data Lakehouse platform. Operating within State Street's Global Technology Services, you will leverage a deep knowledge of the full suite of AWS cloud services combined with high-performance Databricks capabilities to ingest, model, and secure complex enterprise data structures (including contracts, litigation matters, eDiscovery datasets, and global regulatory feeds). This role is critical to establishing a single, highly governed, audit-ready source of truth that powers critical legal operations, compliance analytics, and emerging generative AI/ML use cases across our global footprint.

Key Responsibilities:

Data Lakehouse Engineering & Architecture Design, build, and maintain enterprise-grade, custom data pipelines utilizing Databricks (PySpark, Spark SQL, and Scala) on AWS infrastructure . Implement and manage a multi-layered Lakehouse architecture ( Bronze, Silver, and Gold zones ) to curate unstructured contract text, semi-structured logs, and highly structured transactional tables. Architect robust end-to-end data ingestion frameworks supporting high-throughput batch and near real-time data flows from on-premises systems and third-party legal platforms. Cloud Infrastructure & Platform Optimization Utilize the broad suite of AWS services (including but not limited to S3, Lambda, Glue, EMR, Athena, EC2, and CloudWatch ) to support and optimize distributed storage and compute infrastructure. Conduct advanced performance tuning on large-scale Apache Spark workloads optimizing partitioning, indexing, caching strategies, and Databricks cluster utilization to manage cloud run costs efficiently. Automate deployment configurations, orchestrate multi-dependency workflows (via Databricks Jobs/Workflows, Airflow, or Autosys), and build containerized solutions using Docker. Data Governance, Security & Compliance Enforce strict, fine-grained access controls, row/column-level security, and data classification strategies using Databricks Unity Catalog integrated with AWS IAM and enterprise identity providers. Ensure all data pipelines and lakehouse layers remain strictly compliant with global data privacy regulations (e.g., GDPR) and rigid internal financial audit standards. Implement end-to-end data lineage tracking, validation frameworks, and automated reconciliation routines to preserve absolute data integrity for legal and regulatory reporting. Downstream Integration & Innovation Collaborate with business analysts and legal operations to expose curated datasets via secure APIs and optimized connectors. Enable seamless consumption of financial and legal analytics through integration with visualization tools like Power BI or automation platforms ( Power Apps / Power Automate ). Support data readiness for advanced AI/ML models, contract intelligence tools, and eDiscovery search workflows.

Required Skills & Qualifications Core Technical Skills:

Databricks & Spark:

3+ years of deep, hands-on experience building, scheduling, and debugging data pipelines on Databricks utilizing PySpark, Scala, or Spark

SQL. AWS

Cloud Suite:

Extensive knowledge of AWS core services, with deep familiarity across object storage (S3), serverless compute (Lambda), data cataloging/ETL (Glue), access management (IAM), and encryption (KMS).

Data Modeling:

Strong proficiency in relational database design, data warehousing structures, schema evolution, and performance tuning techniques (e.g., Delta Lake formats, Apache Iceberg).

Programming & Scripting:

Strong coding skills in Python and advanced SQL are mandatory.

CI/CD & Devops:

Proven familiarity with version control (Git) and standard automated deployment workflows.

Domain & Professional Value-Adds:

Regulated Industries:

Experience in Financial Services, Asset Management, or handling highly sensitive, audit-driven data environments is highly preferred.

Legal Data Concepts:

Familiarity with legal data constructs such as contract clauses, corporate matter management, or metadata extraction is a significant advantage.

Ownership Mindset:

Excellent communication skills, with a track record of collaborating across global, distributed engineering and business architecture teams.