Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Senior Data Engineer at Tata Consultancy Services Senior Data Engineer at Tata Consultancy Services in Irving, Texas Posted in 1 day ago.
Type:
full-time
Job Description:
About TATA Consultancy Services (TCS) Tata Consultancy Services (TCS) (
BSE:
532540,
NSE:
TCS) is a digital transformation and technology partner of choice for industry-leading organizations worldwide. Since its inception in 1968, TCS has upheld the highest standards of innovation, engineering excellence and customer service.
Rooted in the heritage of the Tata Group, TCS is focused on creating long term value for its clients, its investors, its employees, and the community at large. With a highly skilled workforce of over 607,979 consultants in 55 countries and 180 service delivery centres across the world, the company has been recognized as a top employer in six continents. With the ability to rapidly apply and scale new technologies, the company has built long term partnerships with its clients - helping them emerge as perpetually adaptive enterprises. Many of these relationships have endured into decades and navigated every technology cycle, from mainframes in the 1970s to Artificial Intelligence today.
TCS sponsors 14 of the world's most prestigious marathons and endurance events, including the TCS New York City Marathon, TCS London Marathon and TCS Sydney Marathon with a focus on promoting health, sustainability, and community empowerment. TCS generated consolidated revenues of US $30 billion in the fiscal year ended March 31, 2025 Kindly visit http://www.tcs.com For more detail s.
Position:
Senior PySpark Data Enginee r
Work Location:
Irving, TX (Onsite )
Yrs of Exp:
8 + Yrs .
Role:
Full Time / Permanent Rol e Job Descriptio n:
Must Have Technical/Functional Skil ls We are seeking a highly skilled and motivated Data Engineer to play a pivotal role in designing, building, and optimizing our next-generation scalable data pipelines. This position requires expertise in processing massive datasets using cutting-edge technologies like Apache Spark, PySpark, and Hive within a dynamic cloud environment. Your primary objective will be to ensure the utmost data reliability, speed, and efficiency, providing a robust foundation for downstream business intelligence and advanced analytics initiativ es. Roles & Responsibilit ies:•
Data Pipeline Development & Maintenance:
Design, build, and maintain highly scalable and efficient ETL/ELT data pipelines utilizing PySpark and Spark SQL for complex data transformati ons.•
Cloud Data Infrastructure Management:
Deploy, manage, and scale critical data infrastructure components on leading cloud platforms such as Amazon Web Services (AWS) (e.g., EMR, Glue), Microsoft Azure (e.g., Databricks, Synapse), or Google Cloud Platform (G CP).•
Data Warehousing & Storage Optimization:
Strategically manage data layout, partitioning, and indexing within Apache Hive and various cloud data lake solutions to optimize performance and accessibil ity.•
Performance Tuning & Optimization:
Proactively identify and resolve performance bottlenecks in Spark jobs, leveraging Spark UI for in-depth analysis, effectively managing data skewness, and optimizing memory utilizat ion.•
Diverse Data Integration:
Develop robust solutions for ingesting high-volume and diverse datasets from both structured relational databases and unstructured flat files into our data ecosys tem.•
Automated Workflow Orchestration:
Implement and manage automated data workflows using industry-standard scheduling tools like Apache Airflow or platform-native schedulers, ensuring timely and reliable data deliv ery.•
Strategic Collaboration:
Partner closely with data scientists, business analysts, and cross-functional enterprise teams to translate complex business requirements into technically sound and efficient data soluti ons.
Qualificat ions:
•
Big Data Frameworks Expertise:
Demonstrated high proficiency in Apache Spark architecture, including a deep understanding of drivers, executors, and Directed Acyclic Graphs (D AGs).•
Advanced Programming:
Exceptional coding skills in Python and extensive experience with the PySpark API for developing intricate data transformations and processing l ogic.•
Querying & Schema Management:
Strong command of HiveQL and ANSI SQL, coupled with expertise in data partitioning techniques and effective schema defini tion.•
Optimized Storage Formats:
In-depth understanding and practical experience with optimized big data storage file formats such as Parquet, ORC, and Avro.•
Cloud Ecosystem Development:
Hands-on development experience utilizing cloud-native big data utilities (e.g., AWS EMR, Azure Databricks) with in major cloud platf orms.•
Data Warehousing Fundamentals:
Solid foundation in Dimensional Data Modeling, including Star and Snowflake schemas, and practical experience with Data Lakes concepts and implementa tion.
Preferred Qualifica tions•
CI/CD & DevOps Automation:
Experience with Continuous Integration/Continuous Deployment (CI/CD) practices and automation tools like Git, Jenkins, or Ans ible.•
NoSQL Database Integration:
Exposure to and experience with NoSQL databases such as HBase, Cassandra, or Mon goDB.•