[Remote] Data Engineer
Note: The job is a remote job and is open to candidates in USA. IPinfo is a company focused on location and context products, seeking a Data Engineer to manage and optimize their data pipelines. The role involves working with large datasets, extending BigQuery pipelines, and addressing geospatial challenges while maintaining clean code and effective communication.ResponsibilitiesMake sense of large, unfamiliar datasets sourced from publicly-contributed (and therefore inconsistent) datasets like OpenStreetMap and Overture, as well as error-prone device datasets with sometimes dozens of poorly-documented columns. Your job is to wade through these datasets, figure out what is going on, and extract a meaningful signalMaintain and extend BigQuery data pipelines, writing efficient, transparent code that achieves complex data tasks while avoiding bloat and spaghettiWork with particular expertise on Geospatial data, knowing the suite of BigQuery geospatial tools like the back of your hand, while dealing with the particular headaches and challenges that geospatial data poses. Occasionally working in python as wellUse AI tooling to move quickly while fully owning every line in your PRsCommunicate problems and solutions clearly using our internal issue-tracking platform; writing concise, reproducible records of the problem, the proposed solutions, and why you made the calls you did, so others can follow and build on themWork occasionally on web-based dashboards to provide visibility to our data pipelines for data engineers as well as others at the companySkillsAdvanced SQL - window functions, CTEs, query restructuring for performance, and an understanding of why a query is slow and how to fix it. BigQuery is a strong plusStrong communication skills - you know how to talk and write about complex problems and data pipelines productivelyA track record of turning messy, ambiguous data into reliable, interpretable signals, with the judgment to explain your callsAn internet record of significant experience as a data scientist or engineer, on Github, StackOverflow, in the academic literature or on a personal blog, or strong references to back up a track record on proprietary code basesClean-code discipline: you don't ship code without tests, code review, readable abstractions. You prefer subtractive solutions to additive solutionsFast learning - comfort becoming productive in unfamiliar domains (internet measurement, geospatial reasoning, internal tooling) with little hand-holdingAI-assisted development paired with full ownership - you can read, debug, and defend everything the tools produceGeospatial fundamentals: coordinate systems, spatial joins, containment, polygon operationsCloud tooling and workflow orchestration (CI/CD, Docker, Airflow, etc.)JavaScript and web dashboards (e.g. Retool, Mapbox, internal validation and visualization tooling)Exposure to the science of internet measurement: BGP/ASN, rDNS, RTT-based geolocation, CGNAT, mobile vs. fixed-line IP behavior, geofeedsStrong Python for geospatial data work - comfortable with the data and geospatial stack (pandas, geopandas, shapely) and writing code that holds up in production, not just in a notebookCompany OverviewIPInfo is a leading provider of IP Address context It was founded in 2014, and is headquartered in Seattle, Washington, USA, with a workforce of 51-200 employees. Its website is https://ipinfo.io.Company H1B SponsorshipIPinfo has a track record of offering H1B sponsorships, with 1 in 2023. Please note that this does not guarantee sponsorship for this specific role.