CODEPROGRAMMER Telegram 3408
This media is not supported in your browser
VIEW IN TELEGRAM
What is a ๐—ฉ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ ๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฎ๐˜€๐—ฒ?

With the rise of Foundational Models, Vector Databases skyrocketed in popularity. The truth is that a Vector Database is also useful outside of a Large Language Model context.

When it comes to Machine Learning, we often deal with Vector Embeddings. Vector Databases were created to perform specifically well when working with them:

โžก๏ธ Storing.
โžก๏ธ Updating.
โžก๏ธ Retrieving.

When we talk about retrieval, we refer to retrieving set of vectors that are most similar to a query in a form of a vector that is embedded in the same Latent space. This retrieval procedure is called Approximate Nearest Neighbour (ANN) search.

A query here could be in a form of an object like an image for which we would like to find similar images. Or it could be a question for which we want to retrieve relevant context that could later be transformed into an answer via a LLM.

Letโ€™s look into how one would interact with a Vector Database:

๐—ช๐—ฟ๐—ถ๐˜๐—ถ๐—ป๐—ด/๐—จ๐—ฝ๐—ฑ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐——๐—ฎ๐˜๐—ฎ.

1. Choose a ML model to be used to generate Vector Embeddings.
2. Embed any type of information: text, images, audio, tabular. Choice of ML model used for embedding will depend on the type of data.
3. Get a Vector representation of your data by running it through the Embedding Model.
4. Store additional metadata together with the Vector Embedding. This data would later be used to pre-filter or post-filter ANN search results.
5. Vector DB indexes Vector Embedding and metadata separately. There are multiple methods that can be used for creating vector indexes, some of them: Random Projection, Product Quantization, Locality-sensitive Hashing.
6. Vector data is stored together with indexes for Vector Embeddings and metadata connected to the Embedded objects.

๐—ฅ๐—ฒ๐—ฎ๐—ฑ๐—ถ๐—ป๐—ด ๐——๐—ฎ๐˜๐—ฎ.

7. A query to be executed against a Vector Database will usually consist of two parts:

โžก๏ธ Data that will be used for ANN search. e.g. an image for which you want to find similar ones.
โžก๏ธ Metadata query to exclude Vectors that hold specific qualities known beforehand. E.g. given that you are looking for similar images of apartments - exclude apartments in a specific location.

8. You execute Metadata Query against the metadata index. It could be done before or after the ANN search procedure.
9. You embed the data into the Latent space with the same model that was used for writing the data to the Vector DB.
10. ANN search procedure is applied and a set of Vector embeddings are retrieved. Popular similarity measures for ANN search include: Cosine Similarity, Euclidean Distance, Dot Product.

How are you using Vector DBs? Let me know in the comment section!

#RAG #LLM #DataEngineering

https://www.tgoop.com/CodeProgrammer โœ…
Please open Telegram to view this post
VIEW IN TELEGRAM



tgoop.com/CodeProgrammer/3408
Create:
Last Update:

What is a ๐—ฉ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ ๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฎ๐˜€๐—ฒ?

With the rise of Foundational Models, Vector Databases skyrocketed in popularity. The truth is that a Vector Database is also useful outside of a Large Language Model context.

When it comes to Machine Learning, we often deal with Vector Embeddings. Vector Databases were created to perform specifically well when working with them:

โžก๏ธ Storing.
โžก๏ธ Updating.
โžก๏ธ Retrieving.

When we talk about retrieval, we refer to retrieving set of vectors that are most similar to a query in a form of a vector that is embedded in the same Latent space. This retrieval procedure is called Approximate Nearest Neighbour (ANN) search.

A query here could be in a form of an object like an image for which we would like to find similar images. Or it could be a question for which we want to retrieve relevant context that could later be transformed into an answer via a LLM.

Letโ€™s look into how one would interact with a Vector Database:

๐—ช๐—ฟ๐—ถ๐˜๐—ถ๐—ป๐—ด/๐—จ๐—ฝ๐—ฑ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐——๐—ฎ๐˜๐—ฎ.

1. Choose a ML model to be used to generate Vector Embeddings.
2. Embed any type of information: text, images, audio, tabular. Choice of ML model used for embedding will depend on the type of data.
3. Get a Vector representation of your data by running it through the Embedding Model.
4. Store additional metadata together with the Vector Embedding. This data would later be used to pre-filter or post-filter ANN search results.
5. Vector DB indexes Vector Embedding and metadata separately. There are multiple methods that can be used for creating vector indexes, some of them: Random Projection, Product Quantization, Locality-sensitive Hashing.
6. Vector data is stored together with indexes for Vector Embeddings and metadata connected to the Embedded objects.

๐—ฅ๐—ฒ๐—ฎ๐—ฑ๐—ถ๐—ป๐—ด ๐——๐—ฎ๐˜๐—ฎ.

7. A query to be executed against a Vector Database will usually consist of two parts:

โžก๏ธ Data that will be used for ANN search. e.g. an image for which you want to find similar ones.
โžก๏ธ Metadata query to exclude Vectors that hold specific qualities known beforehand. E.g. given that you are looking for similar images of apartments - exclude apartments in a specific location.

8. You execute Metadata Query against the metadata index. It could be done before or after the ANN search procedure.
9. You embed the data into the Latent space with the same model that was used for writing the data to the Vector DB.
10. ANN search procedure is applied and a set of Vector embeddings are retrieved. Popular similarity measures for ANN search include: Cosine Similarity, Euclidean Distance, Dot Product.

How are you using Vector DBs? Let me know in the comment section!

#RAG #LLM #DataEngineering

https://www.tgoop.com/CodeProgrammer โœ…

BY Python | Machine Learning | Coding | R


Share with your friend now:
tgoop.com/CodeProgrammer/3408

View MORE
Open in Telegram


Telegram News

Date: |

Co-founder of NFT renting protocol Rentable World emiliano.eth shared the group Tuesday morning on Twitter, calling out the "degenerate" community, or crypto obsessives that engage in high-risk trading. "Doxxing content is forbidden on Telegram and our moderators routinely remove such content from around the world," said a spokesman for the messaging app, Remi Vaughn. When choosing the right name for your Telegram channel, use the language of your target audience. The name must sum up the essence of your channel in 1-3 words. If youโ€™re planning to expand your Telegram audience, it makes sense to incorporate keywords into your name. Read now The groupโ€™s featured image is of a Pepe frog yelling, often referred to as the โ€œREEEEEEEโ€ meme. Pepe the Frog was created back in 2005 by Matt Furie and has since become an internet symbol for meme culture and โ€œdegenโ€ culture.
from us


Telegram Python | Machine Learning | Coding | R
FROM American