Skip to main content
Ctrl+K
AI: ML & Analytics - Home

1. Our first Generative AI application

  • Our first Generative AI application

2. Matrix Algebra with NumPy

  • Matrix operations with Numpy
  • Numpy exercises
  • Working with images in Python

3. Optimization and Automatic Differentiation

  • Review of Optimization

4. Review of Machine Learning with scikit-learn

  • Scikit-learn
  • Cost-Benefit Analysis of a ML classifier.

5. Unsupervised Learning with UMAP

  • Dimensionality Reduction. UMAP
  • Text Analysis with UMAP

6. Text Processing with scikit-learn

  • Session 6: More on working with text data in scikit-learn

7. Intro to Deep Learning

  • 07: Introduction to Neural Networks

9. Deep Learning in Computer Vision

  • Session 9. More on Neural Networks
  • Introduction to Computer Vision with Deep Learning
  • Intro to convolutional neural networks (CNNs)
  • Convolutional Neural Networks (CNNs)

11. Transfer Learning

  • Transfer Learning

12. Zero-Shot Learning

  • Zero-Shot Learning

13. Semantic Search

  • Semantic Search

14. Object Detection

  • Object Detection
  • Zero-shot object detection

15. Exercises I

  • Exercise 1: Hate Spech Detection in X (aka Twitter) 🤬
  • Exercise: Plant Disease App
  • Exercise: Product Item Recommender đź§Ą

16. Exercises II

  • Exercise 1: Eurovision Song Lyrics Analysis
  • Exercise 2: Photo Sorting Application 📸

19. Intro to Transformers

  • Introduction to the Transformers Library for NLP: pipelines
  • Zero-Shot Classification in NLP

20. Transformers Architecture & Transfer Learning

  • Fundamentals of the Transformer Architecture
  • Transfer Learning
  • Transfer Learning example: Legal Text Classification

21. Fine-tuning with Transformers

  • Fine-tuning with transformers’ library
  • Intent Classification in Banking 🏦

22. Speech Recognition with Transformers

  • Automatic Speech Recognition
  • Text to Speech
  • Using GPUs with transformers

23. Instruction Tuning and the GPT API

  • Instruction Tuning of Language Models
  • Chain of Thought Advanced Prompting Examples

24. Retrieval Augmented Generation

  • Semantic Search for texts
  • Multi-modal Models with LlamaIndex

25. Exercises I

  • Exercises 1: LLM models for information extraction in podcasts 🎙️
  • Repository
  • Open issue
  • .ipynb

Semantic Search

Contents

  • Semantic Search
    • Some sample queries
    • A full Semantic Search application
      • Some example queries
  • More applications of Semantic Search

Semantic Search#

Semantic search denotes search with meaning, as distinguished from traditional search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query.

In this class, we will use the CLIP model to perform semantic search. That is, given a text query, we will return the images that are most relevant to the query. To do so, we need to:

  1. Calculate vector embeddings for all of the images in our dataset;

  2. Calculate a vector embedding for a user query (i.e. “cat” or “dog”) and;

  3. Compare the text embedding to the image embeddings to find the closest embeddings.

The closer two embeddings are, the more similar the documents they represent are.

img

Some applications of semantic search in business contexts include:

  1. Improved Customer Experience: With semantic search, businesses can enhance the customer experience by providing more accurate and relevant search results, thus reducing time and effort spent in finding the right products or services.

  2. Personalized Marketing: Semantic search can be applied in marketing to understand customer behavior and preferences, enabling businesses to develop personalized marketing strategies and improve customer engagement.

  3. Content Management: Semantic search can be applied in content management systems to organize and categorize content more efficiently based on their semantic connections.

  4. Product Recommendation: E-commerce businesses can use semantic search to improve their product recommendation systems, thereby increasing sales and customer satisfaction.

  5. Enhanced SEO: Businesses can optimize their websites for semantic search, which can help improve their rankings on search engine result pages and increase visibility.

There are also a lot of start-ups devoted to semantic search, see a list at https://wellfound.com/startups/industry/semantic-search

Exercise Let’s implement a simple semantic search using the CLIP model.

Complete the following code to compute the image embedding vector for each of the images in the dog_examples folder.

Then, create a matrix of size N_images x 512, with each row being the image embedding vector.

import torch
import clip
import numpy as np
import pandas as pd

from PIL import Image

import glob

# Load the CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
image_paths = glob.glob("dog_examples/*")
image_paths
['dog_examples/German-Shepherd-dog-Alsatian.jpg.webp',
 'dog_examples/dog2.png',
 'dog_examples/dog1.png',
 'dog_examples/Chart_rosyjski_borzoj_rybnik-kamien_pl.jpg',
 'dog_examples/shiba-inu-hund.jpg',
 'dog_examples/GettyImages-1454565264-e1701120522406.jpg.webp',
 'dog_examples/GettyImages-157603001-e1701106766955.jpg.webp',
 'dog_examples/98.jpg.webp',
 'dog_examples/Pitbull_6,_2012.jpg',
 'dog_examples/Yorkshire Terrier.jpg.webp']
image_features_list = []

for image_path in image_paths:
    image = preprocess(Image.open(image_path)).unsqueeze(0).to(device)
    with torch.no_grad():
        image_features = model.encode_image(image)

    image_features /= image_features.norm(dim=-1, keepdim=True)
    image_features = image_features

    image_features_list.append(image_features)

image_embedding_matrix = torch.cat(image_features_list, dim=0)
image_embedding_matrix.shape
torch.Size([10, 512])

Now, create a function, that given a string representing a text query, computes the similarity with each of the images, and returns the closest index to the query.

def search(text_query):
    text = clip.tokenize([text_query]).to(device)

    with torch.no_grad():
        text_features = model.encode_text(text)

    text_features /= text_features.norm(dim=-1, keepdim=True)

    image_text_similarity = image_embedding_matrix @ text_features.T

    # the closest index is the one with the highest similarity
    best_image_index = image_text_similarity.argmax().item()
    
    return best_image_index

Some sample queries#

query = "A chihuahua"

best_image_index = search(query)

best_image = Image.open(image_paths[best_image_index])

best_image
../_images/715fbbc7d27bd72cba0fed898a65542cd7ad969f7d4896e006da120754a1f37e.png
query = "A shiba inu"

best_image_index = search(query)

best_image = Image.open(image_paths[best_image_index])

best_image
../_images/753dcf7fe0a6aeff90b1d902478617545af8e8e448601acdfa9e08648236bef3.png
query = "A husky"

best_image_index = search(query)

best_image = Image.open(image_paths[best_image_index])

best_image
../_images/a7eecd3975a7ca93af4e8b0db58fd84ed26b0bd10df416b52d54eab3c4ffc619.png
query = "A wolf"

best_image_index = search(query)

best_image = Image.open(image_paths[best_image_index])

best_image
../_images/a7eecd3975a7ca93af4e8b0db58fd84ed26b0bd10df416b52d54eab3c4ffc619.png
query = "A pitbull"

best_image_index = search(query)

best_image = Image.open(image_paths[best_image_index])

best_image
../_images/2cf5947bf369575f3f7e18bfd879e34f610ec650089aabec3353464a2ff78939.png
query = "The happiest dog of all!"

best_image_index = search(query)

best_image = Image.open(image_paths[best_image_index])

best_image
../_images/cdc62e040b665e109d7b836dc0f68714f03d78ec6e509fb68292a269ed130416.png
query = "Who's a good boy?"

best_image_index = search(query)

best_image = Image.open(image_paths[best_image_index])

best_image
../_images/cdc62e040b665e109d7b836dc0f68714f03d78ec6e509fb68292a269ed130416.png
query = "A yorkie"

best_image_index = search(query)

best_image = Image.open(image_paths[best_image_index])

best_image
../_images/369cd7a657bf653a7653bc26d253b3d17281aed919931a75ad3912e9117a8f69.png

A full Semantic Search application#

The previous example was fine to learn how semantic search works, but our dataset only consisted in 10 images.

Now, we will use a dataset of ~2.000.000 images from Unsplash, https://unsplash.com

Fortunately, the image embeddings have already been computed (it takes a few hours), so we can just load the vector matrix for the images

The following command downloads a table with the image position (0, 1, 2) and the corresponding URL code

!wget https://github.com/haltakov/natural-language-image-search/releases/download/1.0.0/photo_ids.csv -O photo_ids.csv

The following command downloads the image embedding matrix of all the images

!wget https://github.com/haltakov/natural-language-image-search/releases/download/1.0.0/features.npy -O features.npy
import torch
import clip
import numpy as np
import pandas as pd
# Load the CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

We load the previous embedding matrix and check the dimensions

embeddings_matrix = np.load('features.npy')
embeddings_matrix = torch.tensor(embeddings_matrix).float().to(device)
embeddings_matrix.shape
torch.Size([1981161, 512])
photo_ids = pd.read_csv("photo_ids.csv")
photo_ids = list(photo_ids['photo_id'])
len(photo_ids)
1981161
def encode_text_query(text_query):
  with torch.no_grad():
    
    text_encoded = model.encode_text(clip.tokenize(text_query).to(device))
    text_encoded /= text_encoded.norm(dim=-1, keepdim=True)  # normalize the text features

  return text_encoded
def find_best_matches(text_features, photo_features, photo_ids, results_count=3):
  # Compute the similarity between the search query and each photo using the Cosine similarity
  similarities = (photo_features @ text_features.T).squeeze(1)

  # Sort the photos by their similarity score
  best_photo_idx = (-similarities).argsort()

  # Return the photo IDs of the best matches
  return [photo_ids[i] for i in best_photo_idx[:results_count]]
from IPython.display import Image
from IPython.core.display import HTML

def display_photo(photo_id):
  # Get the URL of the photo resized to have a width of 320px
  photo_image_url = f"https://unsplash.com/photos/{photo_id}/download?w=320"

  # Display the photo
  display(Image(url=photo_image_url))

  # Display the attribution text
  display(HTML(f'Photo on <a target="_blank" href="https://unsplash.com/photos/{photo_id}">Unsplash</a> '))
  print()
def search(search_query, photo_features, photo_ids, results_count=3):
  # Encode the search query
  text_features = encode_text_query(search_query)

  # Find the best matches
  best_photo_ids = find_best_matches(text_features, photo_features, photo_ids, results_count)

  # Display the best photos
  for photo_id in best_photo_ids:
    display_photo(photo_id)

Some example queries#

search_query = "A dog playing in the garden"

search(search_query, embeddings_matrix, photo_ids, 3)
Photo on Unsplash

Photo on Unsplash

Photo on Unsplash

search_query = "A dog playing in the snow"

search(search_query, embeddings_matrix, photo_ids, 3)
Photo on Unsplash

Photo on Unsplash

Photo on Unsplash

search_query = "A tiger playing in the snow"

search(search_query, embeddings_matrix, photo_ids, 3)
Photo on Unsplash

Photo on Unsplash

Photo on Unsplash

search_query = "Roman aqueduct from Segovia"

search(search_query, embeddings_matrix, photo_ids, 3)
Photo on Unsplash

Photo on Unsplash

Photo on Unsplash

search_query = "The feeling when the classes are over and you can finally relax"

search(search_query, embeddings_matrix, photo_ids, 1)
Photo on Unsplash

search_query = "IE University student"

search(search_query, embeddings_matrix, photo_ids, 3)
Photo on Unsplash

Photo on Unsplash

Photo on Unsplash

search_query = "A castle in the sky"

search(search_query, embeddings_matrix, photo_ids, 3)
Photo on Unsplash

Photo on Unsplash

Photo on Unsplash

Exercise Write a text query in which the first photo returned is not relevant / an error.

More applications of Semantic Search#

Semantic Search can be specially useful when the image collection belongs to a particular domain.

For example, here you can try another semantic search application using CLIP over a dataset of 80.000 art paintings:

https://art-explorer.komorebi.ai

previous

Zero-Shot Learning

next

Object Detection

Contents
  • Semantic Search
    • Some sample queries
    • A full Semantic Search application
      • Some example queries
  • More applications of Semantic Search

By VĂ­ctor Gallego

© Copyright 2023.