Data Engineer in Hartford, CT or Remote
Data Engineer
Location: Hartford, CT or Remote
Duration: Contract
Rate: DOE
US Citizens, GC, EAD ( H4, L2), E3 TNvisa holders preferred, NO third party corp to corp accepted for this job
Required skills:
⢠Good experience on designing and developing data pipelines for data ingestion and transformation using Spark.
⢠Distributed computing experience using Pyspark or Python
⢠Good understanding of spark framework and spark architecture.
⢠Experience working in Cloud based big data infrastructure.
⢠Excellent in trouble shooting the performance and data skew issues.
⢠Must have good understanding of spark run time metrics and tune applications based on metrics.
⢠Deep knowledge in partitioning, bucketing concepts of data ingestion.
⢠Good understanding of AWS services like Glue, Athena, S3, Lambda, Cloud formation.
⢠Preferred working knowledge on the implementation of datalake ETL using AWS glue, Databricks etc.
⢠Experience with data modelling techniques for cloud data stores and on prem databases like Teradata, Teradata Vantage (TDV)etc
⢠Preferred working experience in ETL development in Teradata vantage and data migration from on prem to Teradata vantage.
⢠Proficiency in SQL, relational and non-relational databases, query optimization and data modelling.
⢠Experience with source code control systems like Gitlab.
⢠Experience with large scale distributed relational and NoSQL database systems.
Technologies:
⢠Pyspark, Python, AWS services, Teradata Vantage, CI/CD technologies, Terraform, SQL
Apply Now
Location: Hartford, CT or Remote
Duration: Contract
Rate: DOE
US Citizens, GC, EAD ( H4, L2), E3 TNvisa holders preferred, NO third party corp to corp accepted for this job
Required skills:
⢠Good experience on designing and developing data pipelines for data ingestion and transformation using Spark.
⢠Distributed computing experience using Pyspark or Python
⢠Good understanding of spark framework and spark architecture.
⢠Experience working in Cloud based big data infrastructure.
⢠Excellent in trouble shooting the performance and data skew issues.
⢠Must have good understanding of spark run time metrics and tune applications based on metrics.
⢠Deep knowledge in partitioning, bucketing concepts of data ingestion.
⢠Good understanding of AWS services like Glue, Athena, S3, Lambda, Cloud formation.
⢠Preferred working knowledge on the implementation of datalake ETL using AWS glue, Databricks etc.
⢠Experience with data modelling techniques for cloud data stores and on prem databases like Teradata, Teradata Vantage (TDV)etc
⢠Preferred working experience in ETL development in Teradata vantage and data migration from on prem to Teradata vantage.
⢠Proficiency in SQL, relational and non-relational databases, query optimization and data modelling.
⢠Experience with source code control systems like Gitlab.
⢠Experience with large scale distributed relational and NoSQL database systems.
Technologies:
⢠Pyspark, Python, AWS services, Teradata Vantage, CI/CD technologies, Terraform, SQL
Apply Now