[Remote] Lead Data Scientist
Note: The job is a remote job and is open to candidates in USA. Smarsh empowers its customers to manage risk and unleash intelligence in their digital communications. As a Lead Data Scientist (NLP & Financial Compliance), you will develop NLP and large language model solutions for compliance and surveillance systems, working with data to uncover misconduct and risk while mentoring junior team members.ResponsibilitiesCollect, analyze, and interpret small/large datasets to uncover meaningful insights to support the development of statistical methods / machine learning algorithmsLead the design, training, and deployment of NLP and transformer-based models for financial surveillance and supervisory use cases (e.g., misconduct detection, market abuse, trade manipulation, insider communication)Development of machine learning models and other analytics following established workflows, while also looking for optimization and improvement opportunitiesData annotation and quality reviewExploratory data analysis and model fail state analysisContribute to model governance, documentation, and explainability frameworks aligned with internal and regulatory AI standardsClient/prospect guidance in machine learning model and analytic fine-tuning/development processesProvide guidance to junior team members on model development and EDAWork with Product Manager(s) to intake project/product requirements and translate these to technical tasks within the team’s tooling, technique and proceduresContinued self-led personal developmentSkillsStrong understanding of financial markets, compliance, surveillance, supervision, or regulatory technologyExperience with one or more data science and machine/deep learning frameworks and tooling, including scikit-learn, H2O, keras, pytorch, tensorflow, pandas, numpy, carot, tidyverseCommand of data science and statistics principles (regression, Bayes, time series, clustering, P/R, AUROC, exploratory data analysis etc…)Strong knowledge of key programming concepts (e.g. split-apply-combine, data structures, object-oriented programming)Solid statistics knowledge (hypothesis testing, ANOVA, chi-square tests, etc…)Knowledge of NLP transfer learning, including word embedding models (gloVe, fastText, word2vec) and transformer models (Bert, SBert, HuggingFace, and GPT-x etc.)Experience with natural language processing toolkits like NLTK, spaCy, Nvidia NeMoKnowledge of microservices architecture and continuous delivery concepts in machine learning and related technologies such as helm, Docker and KubernetesFamiliarity with Deep Learning techniques for NLPFamiliarity with LLMs - using ollama & LangchainExcellent verbal and written skillsProven collaborator, thriving on teamworkMaster's or Doctor of Philosophy degree in Computer Science, Applied Math, Statistics, or a scientific fieldFamiliarity with cloud computing platforms (AWS, GCS, Azure)Experience with automated supervision/surveillance/compliance toolsCompany OverviewSmarsh manage the risk and see the value in their communications data. It was founded in 2001, and is headquartered in Portland, Oregon, USA, with a workforce of 1001-5000 employees. Its website is http://www.smarsh.com.Company H1B SponsorshipSmarsh has a track record of offering H1B sponsorships, with 16 in 2025, 5 in 2024, 12 in 2023, 22 in 2022, 2 in 2021, 1 in 2020. Please note that this does not guarantee sponsorship for this specific role.