tgoop.com/daily_ponv/1704
Last Update:
Harnessing the Universal Geometry of Embeddings
We present the first method to translate text embeddings across different spaces without any paired data or encoders.
Our method, vec2vec, reveals that all encoders—regardless of architecture or training data—learn nearly the same representations!
We demonstrate how to translate between these black-box embeddings without any paired data, maintaining high fidelity.
Using vec2vec, we show that vector databases reveal (almost) as much as their inputs.
Given just vectors (e.g., from a compromised vector database), we show that an adversary can extract sensitive information (e.g., PII) about the underlying text.
Strong Platonic Representation Hypothesis (S-PRH)
We thus strengthen Huh et al.'s PRH to say:
The universal latent structure of text representations can be learned and harnessed to translate representations from one space to another without any paired data or encoders.
https://arxiv.org/abs/2505.12540
BY PONV Daily

Share with your friend now:
tgoop.com/daily_ponv/1704