[Remote] Software Engineer β AI Coding Evaluation
Note: The job is a remote job and is open to candidates in USA. MillionLogics is a global leader in IT solutions specializing in Data & AI, Cloud Solutions, and IT Consulting. They are seeking experienced Software Engineers to evaluate and improve the coding capabilities of frontier AI models by assessing AI-generated code and developing high-quality evaluation datasets and benchmarks.ResponsibilitiesReview and evaluate AI-generated code for correctness, efficiency, maintainability, and adherence to requirementsAnalyze software engineering tasks and validate whether proposed solutions meet expected outcomesDebug code, reproduce issues, and verify fixes across different programming environmentsAssess model-generated explanations, reasoning, and implementation approaches for technical accuracyCreate, refine, and maintain evaluation datasets, benchmarks, and grading rubrics for coding tasksIdentify edge cases, failure modes, and areas where AI systems struggle with software engineering problemsDocument findings clearly and provide structured feedback to improve evaluation quality and consistencyCollaborate with project teams to establish quality standards and evaluation methodologiesSkillsBachelor's or Master's degree in Computer Science, Software Engineering, or a related technical field3+ years of professional software engineering experienceStrong proficiency in one or more of the following languages: Python, Java, C/C++, Go, Swift, Objective-C, PHP, or SQLStrong understanding of data structures, algorithms, software design principles, and debugging methodologiesExperience performing code reviews and evaluating code quality in production or large-scale codebasesAbility to analyze complex technical problems and assess solution correctness with minimal supervisionFamiliarity with version control systems (e.g., Git) and modern software development workflowsStrong written communication skills and attention to detailExperience with AI/ML data annotation, NLP, prompt engineering, model evaluation, or LLM-related projectsExperience evaluating AI-generated code, benchmark creation, or software quality assessmentBenefitsMode of Work: RemoteContract: 12 monthsCommitments Required: At least 4 hours per day and minimum 20 hours per week with overlap of 4 hours with PSTEngagement type : Contractor assignment (no medical/paid leave)Company OverviewAs a trusted Oracle Partner, MillionLogics is more than just an IT solutions provider - it's a global powerhouse blending innovation, expertise, and strategic vision. It was founded in 2020, and is headquartered in London, United Kingdom, GB, with a workforce of 51-200 employees. Its website is https://www.millionlogics.com.