[Remote] Senior Data Engineer
Note: The job is a remote job and is open to candidates in USA. Effectual is seeking a Senior Data Engineer with specialized expertise in data streaming technologies to join their data team. This role focuses on building and maintaining high-performance data streaming architectures that enable real-time data processing and analytics.ResponsibilitiesDesign, build, and maintain scalable streaming data architectures using Kafka, MSK, and KinesisDevelop real-time data pipelines that handle high-volume, high-velocity data streamsImplement event-driven architectures and microservices patterns for streaming data processingCreate and optimize data streaming topologies for complex event processing scenariosDesign fault-tolerant streaming systems with proper error handling and data recovery mechanismsConfigure, deploy, and manage Apache Kafka clusters and AWS MSK environmentsImplement Kafka Connect pipelines for streaming data integrationDesign optimal Kafka topic partitioning strategies and replication configurationsMonitor and optimize Kafka cluster performance, throughput, and latencyImplement Kafka security configurations including SSL/TLS, SASL, and ACLsManage Kafka Schema Registry for data serialization and evolutionDesign and implement Amazon Kinesis Data Streams and Kinesis Data Firehose solutionsConfigure Kinesis Analytics applications for real-time stream processingOptimize Kinesis shard management and auto-scaling configurationsImplement Kinesis data retention and archival strategiesIntegrate Kinesis with other AWS services for comprehensive streaming solutionsDevelop real-time stream processing applications using Apache Spark Streaming, Kafka Streams, or AWS LambdaImplement complex event processing (CEP) patterns for real-time analyticsBuild streaming ETL pipelines that transform data in motionCreate real-time aggregations, windowing operations, and stateful stream processingOptimize streaming query performance and resource utilizationEnsure seamless integration between streaming systems and data lakes, data warehouses, and operational databasesImplement data lineage and monitoring for streaming data pipelinesCreate automated data quality checks and validation for streaming dataManage data serialization formats (Avro, JSON, Protobuf) and schema evolutionCoordinate with data scientists and analysts to ensure streaming data meets analytical requirementsImplement Infrastructure as Code (IaC) for streaming data platforms using Terraform or CloudFormationAutomate deployment and management of streaming infrastructure through CI/CD pipelinesMonitor streaming system health, performance metrics, and alertingImplement disaster recovery and high availability strategies for streaming systemsStay current with emerging trends in streaming technologies and cloud-native solutionsCollaborate with data architects, data scientists, and application teams on streaming data requirementsSupport rigorous project governance through daily progress reviews and time trackingProvide technical leadership and mentorship to junior data engineersCommunicate complex streaming concepts to technical and non-technical stakeholdersOperate with transparency and responsiveness to support high-performing teamsSkills7+ years of experience in the data engineering field with significant streaming data specializationBachelor's degree in Computer Science, Engineering, or related STEM fieldExtensive hands-on experience with Apache Kafka including cluster management, performance tuning, and ecosystem toolsProven experience with AWS MSK and Amazon Kinesis services in production environmentsStrong background in real-time data processing and stream analyticsStreaming Technologies: Apache Kafka, Kafka Connect, Kafka Streams, Amazon MSK, Amazon Kinesis (Data Streams, Data Firehose, Analytics)Programming Languages: Proficient in Python, Java, and Scala for streaming applicationsStream Processing Frameworks: Apache Spark Streaming, Apache Flink, AWS Lambda for stream processingData Serialization: Experience with Avro, Protocol Buffers, JSON, and schema registry managementBig Data Technologies: Hadoop ecosystem, Apache Spark, distributed computing conceptsDatabase Technologies: SQL and NoSQL databases, data warehousing solutions, time-series databasesAWS Services: Deep knowledge of AWS streaming and analytics services (MSK, Kinesis, Lambda, EMR, Glue)Containerization: Docker and Kubernetes for streaming application deploymentInfrastructure as Code: Terraform, CloudFormation for streaming infrastructure automationMonitoring: CloudWatch, Prometheus, Grafana for streaming system observabilitySecurity: Implementation of streaming data security, encryption, and access controlsExpert use of code versioning tools such as GitHubExpert knowledge of Agile methodologies and delivery practicesExperience with CI/CD pipelines for streaming data applicationsUnderstanding of data APIs, REST services, and microservices architecturesLeadership & Team ManagementRisk Management and mitigation strategies for streaming systemsConflict ResolutionStrategic Planning & Leadership for data streaming initiativesResource Management and capacity planningChange Management for streaming technology adoptionCore AWS Certifications: AWS Data Engineer Associate (required)AWS Solutions Architect Professional (preferred)AWS Developer Professional (recommended)Confluent Certified Administrator for Apache Kafka (highly recommended)Confluent Certified Developer for Apache Kafka (preferred)AWS Big Data Specialty (if available in current form)AWS Security SpecialistCertified Associate Data Analyst with PythonCertified Professional Python Programmer Level 1Databricks Data Engineer ProfessionalCertified Associate Python ProgrammerJava or Scala certification (Oracle Certified Professional)Experience with Apache Flink for advanced stream processingKnowledge of Apache Pulsar as an alternative messaging systemExperience with event sourcing and CQRS patternsUnderstanding of Apache Airflow for batch and streaming workflow orchestrationExperience with ksqlDB for stream processing using SQLBackground in financial services, IoT, or other real-time data intensive industriesExperience with multi-cloud streaming architecturesKnowledge of Apache NiFi for data flow automationBenefitsMedical, dental, and vision health insurancesShort term disability, long term disability and life insurances401k with Company matchPaid time off (PTO) (120 hours PTO that accrue over one year)Paid time off for major holidays (14 days per year)These and any other employee benefit offerings are subject to management’s discretion and may change at any time.Company OverviewCloud Service Provider, AWS Premier Tier Services Partner, Generative and Agentic AI, Migration, Modernization It was founded in 2019, and is headquartered in Jersey City, New Jersey, USA, with a workforce of 201-500 employees. Its website is https://www.effectual.ai.Company H1B SponsorshipEffectual has a track record of offering H1B sponsorships, with 3 in 2023, 3 in 2022, 2 in 2021. Please note that this does not guarantee sponsorship for this specific role.