← Back to Projects
Scanpoint Geomatics Ltd. Scanpoint Geomatics Ltd.
Satellite ML Computer Vision Research POC GIS

Satellite ML

CNN-based opium poppy detection on Sentinel-2 imagery — a research POC built at Scanpoint Geomatics.

Client Scanpoint Geomatics Limited Scanpoint Geomatics Ltd. (SGL) — ISRO technology partner Client logo used with permission.
Role ML Engineer (internal R&D)
Year 2024
Status Research POC · F1 0.87 on held-out test data
01 — Overview

Project Overview

A research proof-of-concept built during my time at Scanpoint Geomatics, an Indian geospatial company and ISRO technology development partner. The goal was to evaluate whether convolutional neural networks could detect opium poppy cultivation in Sentinel-2 satellite imagery — a workflow currently dependent on manual analyst review across very large geographies.

The project used labelled training data from Afghanistan, provided by SGL, and produced a working end-to-end pipeline: from raw Sentinel-2 scenes through to coordinate-aligned shapefile outputs compatible with QGIS and SGL's IGiS platform.

This was R&D, not a deployed system. Productization decisions sat with SGL.

02 — Problem Statement

The Problem

Manual analyst review does not scale to the geographies poppy cultivation actually covers. Sentinel-2 provides freely available 10-meter resolution imagery globally with frequent revisit times, but raw imagery is just pixels — turning it into actionable detections requires either an army of trained analysts or a model that can do the first pass automatically.

The question SGL wanted answered was: can a CNN trained on labelled Afghan imagery produce detections accurate enough that a human analyst's role shifts from search to verification?

03 — What I Built

What I Built

01
Imagery ingestion

Sentinel-2 scenes parsed with embedded geospatial metadata preserved, so every prediction can be traced back to real-world coordinates.

02
Tiling

Large scenes split into overlapping patches with the geographic index preserved for downstream reassembly. Overlap matters because field boundaries near tile edges otherwise get cut in half.

03
CNN classification

The model was trained on the SGL-provided labelled dataset from Afghanistan. Class imbalance was the dominant training challenge: poppy fields are a tiny fraction of total agricultural area, so naive sampling gives a model that learns to predict "not poppy" almost all the time.

Core training challenge

Class imbalance — not model architecture — was the hardest problem. Poppy fields are a small fraction of total agricultural area, so without balanced sampling the model degenerates to predicting the majority class. Solving this first is what made the F1 0.87 result possible.

04
Post-processing and GIS export

Tile-level predictions merged, noise artefacts suppressed, and final outputs written as shapefile polygons that load directly into QGIS and SGL's IGiS platform without further processing.

04 — Architecture

System Architecture

Imagery ingestion

Sentinel-2 scenes loaded with embedded geospatial metadata preserved for coordinate-accurate downstream outputs.

Tiling

Scenes split into overlapping geographic patches with the index preserved, so tile-level predictions can be reassembled into full-scene polygons.

CNN classification

Model trained on the labelled Afghanistan dataset. Class-balanced sampling addresses the dominant training challenge: positives are a tiny fraction of total agricultural area.

Post-processing & GIS export

Tile predictions merged, noise suppressed, and final outputs written as shapefile polygons compatible with QGIS and SGL's IGiS platform.

05 — Tech Stack

Technologies Used

Imagery & Geospatial
Sentinel-2 (ESA) GDAL GeoPandas
Modelling
Python PyTorch OpenCV
Output & Integration
QGIS SGL IGiS Docker
06 — Outcomes

Result

F1 score of 0.87 on the held-out test set — strong enough to demonstrate the approach works for the use case, while still leaving meaningful room for productization improvements: false-positive characterisation, generalisation across non-Afghan geographies, and integration with revisit-time workflows.

The pipeline ran end-to-end in Docker, was reproducible across runs, and produced GIS outputs that loaded cleanly into IGiS without manual intervention.

07 — What I'd Approach Differently Today

What I'd Approach Differently Today

Start with a pre-trained backbone.

In 2024 I built a custom CNN. Today I'd start with a pre-trained segmentation backbone — something like SegFormer or a Sentinel-2-specific foundation model — and fine-tune from there. There has been real progress on geospatial foundation models, and the training-data labelling effort would be the same while the model architecture work would be smaller.

Data quality over architecture.

That is how I think about most ML projects now: time is better spent on data quality and the surrounding pipeline than on architecture, unless the architecture is genuinely the research question.