We can have an example of a Siamese network (similar to Sander Dieleman's suggestion ) and convolutional spatial transformer as discussed here. From my understanding, the convolutional spatial transformer applies the ST on the feature, patchwise with distinct affine transformation parameters. I have implemented this very recently and will open a PR if you are fine with having these examples in the Recipes @f0k .
PS : I have a few PRs and issues open in Lasagne which I'm yet to resolve. The past two weeks were a bit hectic for me and I will try to resolve them by this weekend. Apologies