Software Engineer, Data Processing & Privacy - 26-00382
Additional Notes: Data Privacy and legal environments, working in Python & with Claude, handling/processing PII; Soft skills: attention to detail, reliable, good with reviews/audits.
About the role
β’ Client is seeking a detail-oriented Software Engineer on a contract basis to build and run data processing pipelines for datasets used in our research. You'll take raw, heterogeneous inputs β text, code, documents, structured exports β and turn them into clean, well-structured, privacy-safe outputs ready for downstream use.
β’ The work spans ingestion, format normalization, data quality, privacy handling (including PII de-identification), and the supporting tooling that makes the pipeline reliable and self-serve. You'll iterate closely with internal teams on QA findings and harden the pipeline so each new dataset is cheaper than the last.
Responsibilities
β’ Build and extend per-source processing for new data types as they arrive
β’ Ingest and normalize raw exports across many formats into consistent, well-structured outputs
β’ Handle privacy requirements β for example, PII detection and de-identification β to meet our internal compliance bar
β’ Run data quality QA: automated checks plus LLM-assisted review to flag gaps, malformed inputs, and incompleteness
β’ Iterate on internal feedback: root-cause issues, fix, re-run, re-deliver
β’ Build supporting tools: auditing, data exploration, monitoring, simple search over processed data
β’ Land cleaned data with the right storage layout and access controls
β’ Document and harden the pipeline so each new dataset is cheaper than the last
You may be a good fit if you
β’ Have 4+ years of software engineering experience, with substantial time on data pipelines
β’ Are a proficient user of Claude / Claude Code for day-to-day engineering and know when to verify its output
Are genuinely detail-oriented
β’ Have high integrity and take handling real people's personal data seriously
β’ Are comfortable with sustained, careful data work and find satisfaction in getting it right
β’ Can work independently, ship reliably, and communicate clearly about progress and edge cases
β’ Are proficient in Python and comfortable working across many heterogeneous, semi-structured formats (JSON, NDJSON, code, HTML/XML dumps, archives)
β’ Strong candidates may also have experience with
β’ PII detection and anonymization techniques
β’ Working with large, messy, semi-structured text and code corpora
β’ Data quality monitoring and validation
β’ Cloud storage and access-control patterns (S3/GCS, IAM)
β’ Building internal tools or self-serve data platforms for researchers
β’ Information retrieval, search, or RAG systems.
Apply tot his job
Apply To this Job
About the role
β’ Client is seeking a detail-oriented Software Engineer on a contract basis to build and run data processing pipelines for datasets used in our research. You'll take raw, heterogeneous inputs β text, code, documents, structured exports β and turn them into clean, well-structured, privacy-safe outputs ready for downstream use.
β’ The work spans ingestion, format normalization, data quality, privacy handling (including PII de-identification), and the supporting tooling that makes the pipeline reliable and self-serve. You'll iterate closely with internal teams on QA findings and harden the pipeline so each new dataset is cheaper than the last.
Responsibilities
β’ Build and extend per-source processing for new data types as they arrive
β’ Ingest and normalize raw exports across many formats into consistent, well-structured outputs
β’ Handle privacy requirements β for example, PII detection and de-identification β to meet our internal compliance bar
β’ Run data quality QA: automated checks plus LLM-assisted review to flag gaps, malformed inputs, and incompleteness
β’ Iterate on internal feedback: root-cause issues, fix, re-run, re-deliver
β’ Build supporting tools: auditing, data exploration, monitoring, simple search over processed data
β’ Land cleaned data with the right storage layout and access controls
β’ Document and harden the pipeline so each new dataset is cheaper than the last
You may be a good fit if you
β’ Have 4+ years of software engineering experience, with substantial time on data pipelines
β’ Are a proficient user of Claude / Claude Code for day-to-day engineering and know when to verify its output
Are genuinely detail-oriented
β’ Have high integrity and take handling real people's personal data seriously
β’ Are comfortable with sustained, careful data work and find satisfaction in getting it right
β’ Can work independently, ship reliably, and communicate clearly about progress and edge cases
β’ Are proficient in Python and comfortable working across many heterogeneous, semi-structured formats (JSON, NDJSON, code, HTML/XML dumps, archives)
β’ Strong candidates may also have experience with
β’ PII detection and anonymization techniques
β’ Working with large, messy, semi-structured text and code corpora
β’ Data quality monitoring and validation
β’ Cloud storage and access-control patterns (S3/GCS, IAM)
β’ Building internal tools or self-serve data platforms for researchers
β’ Information retrieval, search, or RAG systems.
Apply tot his job
Apply To this Job