On The Continuitiy Of Multi-Point Proximity Encodings
Earlier posts talked about geometric encoding for geospatial Machine Learning (ML) and Artificial Intelligence (AI) applicaitons, with a focus on Multi-Point Proximity (MPP) encoding. MPP yields conformable encodings (i.e. having consitent size and shape) for any geometric object in a bounded domain. In other words, you could have a map containing roads, rivers, bus stops, parks, industrial zones, building footprints, and so on; and each of them can be represented as an MPP-encoded vector of the same length. That makes it easy to feed arbitrary sets of objects to ML/AI models. It's a handy technique that enables all kinds of analysis on geospatial objects.
One post in particular argued that MPP encodings posess certain desireable mathematical properties. Among these is continuity – the notion that small differences between objects' shapes of positions corresponds to a small differences in their encodings. Or in other words, two objects that are "almost the same" will have encodings that are "almost the same", and there will me no unexpected "jumps" in the encodings due to infinitesimal shape differences.
I offered an empirical demonstration of how MPP encodings vary continuiously with small changes in an object's position, shape, or location; but stopped short of providing a rigorious proof of the property. Here I aim to back up this assertion with real math.
Distance measures for shapes and vectors
The casual definition above is a good starting point for a proof: continuity means that a small difference between two objects' size, shape, or position corresponds to a small change in their MPP encodings.
The first thing we need is a precise definition of "small difference".
MPP encodings are vectors, so "small difference" can be defined as "having a Euclidean distance less than some arbitrarily small number $\epsilon$".
Distances between geometric objects are not as easily defined. Shapes come in different types: Points, LineStrings, Polygons, MultiPolygons and so on. We need a notion of "distance" that applies to arbitrary pairs of objects, possibly of different types.
Given two geometric objects $A$ and $B$ the Hausdorff distance $h(A, B)$ is defined as the maximum distance from any point in $A$ to its closest point in $B$. Formally,
\[h(A,B) = \max_{a \in A} \min_{b \in B} d(a, b),\]
where $d$ represents Euclidean distance. The Hausdorff distance so defined is not symmetric: $h(A, B)$ is not necessarily the same as $h(B, A)$ We can define a symmetric Hausdorff distance $H(A, B)$ as the maximum value of the two:
\[H(A, B) = \max \{ h(A, B), h(B, A) \} \]
This is a good metric to define "closeness" between shape pairs. Specifically, $H(A, B) < \epsilon$ means "every point of $A$ is within a distance $\epsilon$ of some point in $B$". For two identical shapes, $H(A, B) = 0$.
MPP encoding and continuity
The general idea of MPP encoding is spelled out in an earlier blog post. But it's worth repeating.
Suppose we have a geometric object $A \in \mathbb{R}^2$. $A$ could be any geometric type, such as a Point, LineString, Polygon, or a multipart geometry (i.e. collection of any of the above). Define a set of reference Points $R=\{r_i: i \in 1..n\}$. For this discussion it doesn't matter how many there are or what strategy is used to define them. The MPP encoding of the object $A$ is defined as the ordered vector
\[\Phi(A) = [e^{-d(A, r_i)/s} : i \in 1..n]\]
where $d(A, r_i)$ is the minimum distance between a point in $A$ and the reference point $r_i$, and $s$ is a scaling factor. In other words, compute the distance between the object and all reference points, apply negative exponential scaling, and arrange the results as an ordered vector. That's the MPP encoding of $A$.
For proving that this encoding is continuous, I'm going to drop the exponential scaling, and work only with the distance between objects and reference points. Since exponential scaling is a continuous and monotonic transformation, conclusions are still valid.
First, define continuity by formalizing the notion above. MPP encoding will be said to be continuous if for two point sets $A$ and $B$,
\[H(A, B)\rightarrow 0 \implies\parallel\Phi(A) - \Phi(B)\parallel\rightarrow 0 \]
Assume that the symmetric Hausdorff distance between $A$ and $B$ is some small value $H(A, B) = \epsilon$. To proceed, consider a the closest-point distance between $A$ and a reference point $R$.
\[\phi_r(A)=\inf_{x\in A}d(r, x)\]
Let $a \in A$ represent the point that achieves this minimum distance. The fact that $H(A, B) = \epsilon$ guarantees that there is a point $b \in B$ such that $d(a, b) \leq \epsilon$. The triangle inequality implies that
\[d(r,b) \leq d(r,a) + d(a,b) \leq \phi_r(A)+\epsilon\].
Now because $\phi_r(B) = \inf_{x \in B} d(r,x)$, we have
\[\phi_r(B) \leq d(r, b) \leq \phi_r(A) + \epsilon \]
which implies
\[\phi_r(B) -\phi_r(A) \leq \epsilon \]
Since we are using the symmetric Hausdorff distance, all steps above still apply if we swap $A$ and $B$, so
\[\phi_r(A) - \phi_r(B) \leq \epsilon, \]
and therefore
\[|\phi_r(A)-\phi_r(B)| \leq \epsilon = H(A, B)\]
That result applies for any individual reference point used for the encoding. Since there are $n$ such points, for the full encoding
\[\parallel \Phi(A) - \Phi(B) \parallel \leq \sqrt{n} H(A,B)\].
This proves that as the symmetric Hausdorff distance between two shapes approaches zero, then the Eucliden distance between their encodings also approaches zero. QED.
Why this is important
Continuity is recognized as an important chartacteristic of geometric encoding methods [Mai24]. It means that if two shapes are similar, one can expect at most small differences in their encodings. This can help with generalization of inferences about shapes.
Importantly, various geometric encoding methods have been propsed which do not have this property. "Rasterization", mentioned above, may yield divergent encodings for shapes that lie arbitrarily close to one another, if the differences cross boundaries of the grid cells used for the rasterization.
Another class of geometric encodings involves processing the sequences of coordinates that define LineStrings or Polygon boundaries. Suppose one adds to a shape a vertex that does not actually change the shape because it lies on an existing edge. When one processes the list of vertices through a recurrent neural network , the result may not be the same as for the original shape. Thus a Hausdorff distance of zero may correspond to a finite encoding difference, which violates the definition of continuity above.
As a final note, the proof above did not require working in 2-dimensional space, so the result applies as well to 3-dimensional objects, or abstract point sets in higher dimensions.
Conclusion
The argument above backs up the previously unproven assertion that Multi-Point Proximity encoding is continuous. Hopefully this increases confidence in the use of the method in geospatial ML and AI models.
I hope you enjoyed reading this. Comments are more than welcome!
AI Disclosure
I made generous use of ChatGPT in the ideation and derivation of the proof given above. All narrative text was generated by the author.
References
Mai24: Gengchen Mai, Yiqun Xie, Xiaowei Jia, Ni Lao, Jinmeng Rao, Qing Zhu, Zeping Liu, Yao-Yi Chiang, Junfeng Jiao, Towards the next generation of Geospatial Artificial Intelligence, International Journal of Applied Earth Observation and Geoinformation, Volume 136, 2025, 104368, ISSN 1569-8432.
Comments ()