Let’s try to locate UAV with change detection datasets and FaceNet

Another weekend experiment

5 min readJul 4, 2023

Assuming you have an aerial image and do not know the location, where it was captured, but you know spatial resolution and direction. How could computer vision help to solve this problem? It looks like a common problem for fact checking, georeferencing of corrupted data or navigation.

The first problem is data. There are just a few dataset for exactly this task.

I found recent research https://github.com/arplaboratory/satellite-thermal-geo-localization/tree/main where scientists solved similar task, but for thermal imagery.

Change detection datasets

When I looked on TorchGeo, I found that change detection datasets are what I need. There are temporal image pairs, and there are diversity in landscapes and seasons.

Also, it is possible to use segmentation or classification datasets and aggressive augmentation.

Also, I found S2Looking, LevirCD+, XView2 change detection datasets, which have already been uploaded to Kaggle and ready to use with Kaggle notebooks.

So, Kaggle — is my choice, due I have no GPU-s and it is more convenient than Google Colab due it is possible to run training while I am offline.

Model and concept

As a model I used Mobilenet with TripletMargin loss. This concept appeared in 2015, and successful used for face verification https://arxiv.org/abs/1503.03832 by many companies.

Currently, there are many other options, but old and well known solution seems good for a first experiments.

Classic SIFT and ORB matching unfortunately lacks robustness, especially for matching images captured with different seasons and using different sensors.

I use the model in a sliding window fashion with stride to find most similar point in source image.

Sliding window concept and probability map

Also, recently there appeared a lot of interesting options like https://github.com/cvg/LightGlue and there was a pretty exciting challenge on Kaggle https://www.kaggle.com/competitions/image-matching-challenge-2023 . Looks like there could be many great ideas for further research.

Data augmentation

I use two stage data augmentation to make a model more robust:

Anchor and positive samples augmented by the same geometric transforms.
Positive and negative samples augmented with same color transforms, but different color transforms from anchor image. Try to mimic different sensors or different seasons between anchor and positive-negative samples.

The experiment

Several months ago I collected data for another experiment https://medium.com/p/d61b4b936d52 and it looks like I could use it.

Raw footage from UAV camera

I grabbed three frames from a footage.

I only shift-scale-rotated them and not used orthocorrection or structure from motion, assuming I have only altitude and horizon due solution should work fast.

Images captures from different altitude and different angles. Resolution set as 0.6m in EPSG:3857 to fit TMS Zoom level=18.

As seen on example image, there are many differences:

different seasons — winter and summer
different weather — sunny and cloudy
big distortions, especially closely to image border, different blur and sharpening
changes in a land-use

Matching

Fist raw result of matching map confuse me, but when I left on matching map only 2.5% min distances, it start start looking much better, when I left top 3 closest patches things start looking well.

Output, top 2.5% distance range, top 3 patches

Maybe the picture could be better, if train a network for more epochs.

Sample 1. Woks well. Found.

In the whole big image, best result are in target area. I think it is success.

Sample 2. Found in a nearby area.

Looks like the network deals worse with such amounts of color and geometric distortions to match an image. May be there are issues with training dataset and network not training well.

Sample 3. More false positives, but found in the nearby area.

Looks like the network deals worse with such amounts of color and geometric distortions to match an image. May be there are issues with training dataset and network not training well.

This part was edited, I made mistakes due evaluation.

Conclusions

Most people query orthocorrected images as query images, but it is hard to make such a kind of correction on UAV due to big computation costs. Also, I not found experiments with different seasons. Here is clear evidence — it could work.

Also, there is a clear evidence, that openly available change detection datasets could be used in development of similar solutions.

I had just about 4 days for this experiment and writing the article so, unfortunately, the training dataset was not explored well, target data not explored, the neural network was built just within a couple of hours and trained from the second attempt, errors not explored. There is a big room for further research. Training pipeline on Kaggle https://www.kaggle.com/aliaksandr960/cd-to-loc-training-v1