GLiNER: Generalist and Lightweight Model for Named Entity Recognition#
GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
Demo: 🤗 Hugging Face
🌟 Available Models on Hugging Face#
🇬🇧 For English#
GLiNER Base:
urchade/gliner_base
(CC BY NC 4.0)GLiNER Small:
urchade/gliner_small
(CC BY NC 4.0)GLiNER Small v2:
urchade/gliner_small-v2
(Apache 2.0)GLiNER Small v2.1:
urchade/gliner_small-v2.1
(Apache 2.0)GLiNER Medium:
urchade/gliner_medium
(CC BY NC 4.0)GLiNER Medium v2:
urchade/gliner_medium-v2
(Apache 2.0)GLiNER Medium v2.1:
urchade/gliner_medium-v2.1
(Apache 2.0)GLiNER Large:
urchade/gliner_large
(CC BY NC 4.0)GLiNER Large v2:
urchade/gliner_large-v2
(Apache 2.0)
🌍 For Other Languages#
Korean: 🇰🇷
taeminlee/gliner_ko
Italian: 🇮🇹
DeepMount00/universal_ner_ita
Multilingual: 🌐
urchade/gliner_multi
(CC BY NC 4.0) andurchade/gliner_multi-v2.1
(Apache 2.0)
🔬 Domain Specific Models#
Biomedical: 🧬
urchade/gliner_large_bio-v0.1
(Apache 2.0)
!pip install gliner
Requirement already satisfied: gliner in /opt/homebrew/lib/python3.10/site-packages (0.1.7)
Requirement already satisfied: huggingface-hub>=0.21.4 in /opt/homebrew/lib/python3.10/site-packages (from gliner) (0.22.2)
Requirement already satisfied: torch>=2.0.0 in /opt/homebrew/lib/python3.10/site-packages (from gliner) (2.2.2)
Requirement already satisfied: seqeval in /opt/homebrew/lib/python3.10/site-packages (from gliner) (1.2.2)
Requirement already satisfied: flair==0.13.1 in /opt/homebrew/lib/python3.10/site-packages (from gliner) (0.13.1)
Requirement already satisfied: tqdm in /opt/homebrew/lib/python3.10/site-packages (from gliner) (4.64.1)
Requirement already satisfied: transformers>=4.38.2 in /opt/homebrew/lib/python3.10/site-packages (from gliner) (4.39.3)
Requirement already satisfied: bpemb>=0.3.2 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (0.3.5)
Requirement already satisfied: scikit-learn>=1.0.2 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (1.4.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (2.8.2)
Requirement already satisfied: deprecated>=1.2.13 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (1.2.14)
Requirement already satisfied: gdown>=4.4.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (5.1.0)
Requirement already satisfied: pptree>=3.1 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (3.1)
Requirement already satisfied: segtok>=1.5.11 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (1.5.11)
Requirement already satisfied: urllib3<2.0.0,>=1.0.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (1.26.18)
Requirement already satisfied: langdetect>=1.0.9 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (1.0.9)
Requirement already satisfied: semver<4.0.0,>=3.0.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (3.0.2)
Requirement already satisfied: regex>=2022.1.18 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (2023.12.25)
Requirement already satisfied: tabulate>=0.8.10 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (0.9.0)
Requirement already satisfied: conllu>=4.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (4.5.3)
Requirement already satisfied: transformer-smaller-training-vocab>=0.2.3 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (0.4.0)
Requirement already satisfied: ftfy>=6.1.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (6.2.0)
Requirement already satisfied: wikipedia-api>=0.5.7 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (0.6.0)
Requirement already satisfied: mpld3>=0.3 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (0.5.10)
Requirement already satisfied: lxml>=4.8.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (5.2.1)
Requirement already satisfied: sqlitedict>=2.0.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (2.1.0)
Requirement already satisfied: pytorch-revgrad>=0.2.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (0.2.0)
Requirement already satisfied: janome>=0.4.2 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (0.5.0)
Requirement already satisfied: more-itertools>=8.13.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (10.2.0)
Requirement already satisfied: matplotlib>=2.2.3 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (3.6.3)
Requirement already satisfied: boto3>=1.20.27 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (1.34.83)
Requirement already satisfied: gensim>=4.2.0 in /opt/homebrew/lib/python3.10/site-packages (from flair==0.13.1->gliner) (4.3.2)
Requirement already satisfied: filelock in /opt/homebrew/lib/python3.10/site-packages (from huggingface-hub>=0.21.4->gliner) (3.13.1)
Requirement already satisfied: packaging>=20.9 in /opt/homebrew/lib/python3.10/site-packages (from huggingface-hub>=0.21.4->gliner) (23.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/homebrew/lib/python3.10/site-packages (from huggingface-hub>=0.21.4->gliner) (4.9.0)
Requirement already satisfied: requests in /opt/homebrew/lib/python3.10/site-packages (from huggingface-hub>=0.21.4->gliner) (2.31.0)
Requirement already satisfied: pyyaml>=5.1 in /opt/homebrew/lib/python3.10/site-packages (from huggingface-hub>=0.21.4->gliner) (6.0.1)
Requirement already satisfied: fsspec>=2023.5.0 in /opt/homebrew/lib/python3.10/site-packages (from huggingface-hub>=0.21.4->gliner) (2023.12.2)
Requirement already satisfied: sympy in /opt/homebrew/lib/python3.10/site-packages (from torch>=2.0.0->gliner) (1.12)
Requirement already satisfied: jinja2 in /opt/homebrew/lib/python3.10/site-packages (from torch>=2.0.0->gliner) (3.1.3)
Requirement already satisfied: networkx in /opt/homebrew/lib/python3.10/site-packages (from torch>=2.0.0->gliner) (3.2.1)
Requirement already satisfied: safetensors>=0.4.1 in /opt/homebrew/lib/python3.10/site-packages (from transformers>=4.38.2->gliner) (0.4.2)
Requirement already satisfied: numpy>=1.17 in /opt/homebrew/lib/python3.10/site-packages (from transformers>=4.38.2->gliner) (1.24.1)
Requirement already satisfied: tokenizers<0.19,>=0.14 in /opt/homebrew/lib/python3.10/site-packages (from transformers>=4.38.2->gliner) (0.15.2)
Requirement already satisfied: botocore<1.35.0,>=1.34.83 in /opt/homebrew/lib/python3.10/site-packages (from boto3>=1.20.27->flair==0.13.1->gliner) (1.34.83)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/homebrew/lib/python3.10/site-packages (from boto3>=1.20.27->flair==0.13.1->gliner) (1.0.1)
Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /opt/homebrew/lib/python3.10/site-packages (from boto3>=1.20.27->flair==0.13.1->gliner) (0.10.1)
Requirement already satisfied: sentencepiece in /opt/homebrew/lib/python3.10/site-packages (from bpemb>=0.3.2->flair==0.13.1->gliner) (0.2.0)
Requirement already satisfied: wrapt<2,>=1.10 in /opt/homebrew/lib/python3.10/site-packages (from deprecated>=1.2.13->flair==0.13.1->gliner) (1.16.0)
Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /opt/homebrew/lib/python3.10/site-packages (from ftfy>=6.1.0->flair==0.13.1->gliner) (0.2.13)
Requirement already satisfied: beautifulsoup4 in /opt/homebrew/lib/python3.10/site-packages (from gdown>=4.4.0->flair==0.13.1->gliner) (4.12.3)
Requirement already satisfied: smart-open>=1.8.1 in /opt/homebrew/lib/python3.10/site-packages (from gensim>=4.2.0->flair==0.13.1->gliner) (7.0.4)
Requirement already satisfied: scipy>=1.7.0 in /opt/homebrew/lib/python3.10/site-packages (from gensim>=4.2.0->flair==0.13.1->gliner) (1.12.0)
Requirement already satisfied: six in /opt/homebrew/lib/python3.10/site-packages (from langdetect>=1.0.9->flair==0.13.1->gliner) (1.16.0)
Requirement already satisfied: cycler>=0.10 in /opt/homebrew/lib/python3.10/site-packages (from matplotlib>=2.2.3->flair==0.13.1->gliner) (0.11.0)
Requirement already satisfied: contourpy>=1.0.1 in /opt/homebrew/lib/python3.10/site-packages (from matplotlib>=2.2.3->flair==0.13.1->gliner) (1.0.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/homebrew/lib/python3.10/site-packages (from matplotlib>=2.2.3->flair==0.13.1->gliner) (1.4.4)
Requirement already satisfied: pillow>=6.2.0 in /opt/homebrew/lib/python3.10/site-packages (from matplotlib>=2.2.3->flair==0.13.1->gliner) (10.2.0)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/homebrew/lib/python3.10/site-packages (from matplotlib>=2.2.3->flair==0.13.1->gliner) (3.0.9)
Requirement already satisfied: fonttools>=4.22.0 in /opt/homebrew/lib/python3.10/site-packages (from matplotlib>=2.2.3->flair==0.13.1->gliner) (4.38.0)
Requirement already satisfied: joblib>=1.2.0 in /opt/homebrew/lib/python3.10/site-packages (from scikit-learn>=1.0.2->flair==0.13.1->gliner) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/homebrew/lib/python3.10/site-packages (from scikit-learn>=1.0.2->flair==0.13.1->gliner) (3.2.0)
Requirement already satisfied: protobuf in /opt/homebrew/lib/python3.10/site-packages (from transformers>=4.38.2->gliner) (5.26.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/homebrew/lib/python3.10/site-packages (from jinja2->torch>=2.0.0->gliner) (2.1.4)
Requirement already satisfied: idna<4,>=2.5 in /opt/homebrew/lib/python3.10/site-packages (from requests->huggingface-hub>=0.21.4->gliner) (3.6)
Requirement already satisfied: certifi>=2017.4.17 in /opt/homebrew/lib/python3.10/site-packages (from requests->huggingface-hub>=0.21.4->gliner) (2023.11.17)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/homebrew/lib/python3.10/site-packages (from requests->huggingface-hub>=0.21.4->gliner) (3.3.2)
Requirement already satisfied: mpmath>=0.19 in /opt/homebrew/lib/python3.10/site-packages (from sympy->torch>=2.0.0->gliner) (1.3.0)
Requirement already satisfied: accelerate>=0.21.0 in /opt/homebrew/lib/python3.10/site-packages (from transformers>=4.38.2->gliner) (0.29.2)
Requirement already satisfied: soupsieve>1.2 in /opt/homebrew/lib/python3.10/site-packages (from beautifulsoup4->gdown>=4.4.0->flair==0.13.1->gliner) (2.5)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /opt/homebrew/lib/python3.10/site-packages (from requests->huggingface-hub>=0.21.4->gliner) (1.7.1)
Requirement already satisfied: psutil in /opt/homebrew/lib/python3.10/site-packages (from accelerate>=0.21.0->transformers>=4.38.2->gliner) (5.9.8)
[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: python3.10 -m pip install --upgrade pip
Basic Use Case#
from gliner import GLiNER
# Initialize GLiNER with the base model
model = GLiNER.from_pretrained("urchade/gliner_medium-v2.1")
/Users/victorgallego/miniforge3/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
/Users/victorgallego/miniforge3/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:550: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
warnings.warn(
# Sample text for entity prediction
text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""
# Labels for entity prediction
labels = ["Person", "Award", "Date", "Competitions", "Teams"] # use capital case for better performance
# Perform entity prediction
entities = model.predict_entities(text, labels, threshold=0.5)
# Display predicted entities and their labels
for entity in entities:
print(entity["text"], "=>", entity["label"])
Cristiano Ronaldo dos Santos Aveiro => Person
5 February 1985 => Date
Portugal national team => Teams
Ballon d'Or => Award
UEFA Men's Player of the Year Awards => Award
European Golden Shoes => Award
UEFA Champions Leagues => Competitions
UEFA European Championship => Competitions
UEFA Nations League => Competitions
European Championship => Competitions
import pandas as pd
df = pd.DataFrame(entities)
df
start | end | text | label | score | |
---|---|---|---|---|---|
0 | 1 | 36 | Cristiano Ronaldo dos Santos Aveiro | Person | 0.864556 |
1 | 92 | 107 | 5 February 1985 | Date | 0.985105 |
2 | 233 | 255 | Portugal national team | Teams | 0.540601 |
3 | 338 | 349 | Ballon d'Or | Award | 0.604587 |
4 | 381 | 417 | UEFA Men's Player of the Year Awards | Award | 0.817369 |
5 | 428 | 449 | European Golden Shoes | Award | 0.809395 |
6 | 556 | 578 | UEFA Champions Leagues | Competitions | 0.836124 |
7 | 584 | 610 | UEFA European Championship | Competitions | 0.869951 |
8 | 619 | 638 | UEFA Nations League | Competitions | 0.924063 |
9 | 761 | 782 | European Championship | Competitions | 0.731239 |
import numpy as np
from random import randint
df = pd.DataFrame(entities)
unique_labels = df['label'].unique()
colors = {label: f'background-color: rgba({randint(0,255)},{randint(0,255)},{randint(0,255)},{np.round(np.random.uniform(0.1,0.4), 2)})'
for label in unique_labels}
def color_rows(row):
color = colors.get(row['label'], '')
return [f'{color}' for _ in row]
df.style.apply(color_rows, axis=1)
start | end | text | label | score | |
---|---|---|---|---|---|
0 | 1 | 36 | Cristiano Ronaldo dos Santos Aveiro | Person | 0.864556 |
1 | 92 | 107 | 5 February 1985 | Date | 0.985105 |
2 | 233 | 255 | Portugal national team | Teams | 0.540601 |
3 | 338 | 349 | Ballon d'Or | Award | 0.604587 |
4 | 381 | 417 | UEFA Men's Player of the Year Awards | Award | 0.817369 |
5 | 428 | 449 | European Golden Shoes | Award | 0.809395 |
6 | 556 | 578 | UEFA Champions Leagues | Competitions | 0.836124 |
7 | 584 | 610 | UEFA European Championship | Competitions | 0.869951 |
8 | 619 | 638 | UEFA Nations League | Competitions | 0.924063 |
9 | 761 | 782 | European Championship | Competitions | 0.731239 |
What happens if we reduce the threshold?
Example: Job Offer Analysis#
text = """
* Data Scientist, Data Analyst, or Data Engineer with 1+ years of experience.
* Experience with technologies such as Docker, Kubernetes, or Kubeflow
* Machine Learning experience preferred
* Experience with programming languages such as Python, C++, or SQL preferred
* Experience with technologies such as Databricks, Qlik, TensorFlow, PyTorch, Python, Dash, Pandas, or NumPy preferred
* BA or BS degree
* Active Secret OR Active Top Secret or Active TS/SCI clearance
"""
labels = ["programing language", "software tool", "degree", "job title"]
entities = model.predict_entities(text, labels, threshold=0.5)
df = pd.DataFrame(entities)
unique_labels = df['label'].unique()
colors = {label: f'background-color: rgba({randint(0,255)},{randint(0,255)},{randint(0,255)},{np.round(np.random.uniform(0.1,0.4), 2)})'
for label in unique_labels}
def color_rows(row):
color = colors.get(row['label'], '')
return [f'{color}' for _ in row]
df.style.apply(color_rows, axis=1)
start | end | text | label | score | |
---|---|---|---|---|---|
0 | 3 | 17 | Data Scientist | job title | 0.877562 |
1 | 19 | 31 | Data Analyst | job title | 0.829869 |
2 | 36 | 49 | Data Engineer | job title | 0.811564 |
3 | 141 | 149 | Kubeflow | software tool | 0.668544 |
4 | 238 | 244 | Python | programing language | 0.988815 |
5 | 246 | 249 | C++ | programing language | 0.920541 |
6 | 254 | 257 | SQL | programing language | 0.783584 |
7 | 307 | 317 | Databricks | software tool | 0.850532 |
8 | 319 | 323 | Qlik | software tool | 0.674751 |
9 | 325 | 335 | TensorFlow | software tool | 0.929136 |
10 | 337 | 344 | PyTorch | programing language | 0.620552 |
11 | 346 | 352 | Python | programing language | 0.985091 |
12 | 354 | 358 | Dash | software tool | 0.748522 |
13 | 389 | 391 | BA | degree | 0.914484 |
14 | 395 | 397 | BS | degree | 0.966724 |
Example: Literature Research#
text = """Libretto by Marius Petipa, based on the 1822 novella ``Trilby, ou Le Lutin d'Argail`` by Charles Nodier, first presented by the Ballet of the Moscow Imperial Bolshoi Theatre on January 25/February 6 (Julian/Gregorian calendar dates), 1870, in Moscow with Polina Karpakova as Trilby and Ludiia Geiten as Miranda and restaged by Petipa for the Imperial Ballet at the Imperial Bolshoi Kamenny Theatre on January 17–29, 1871 in St. Petersburg with Adèle Grantzow as Trilby and Lev Ivanov as Count Leopold."""
labels = ["person", "book", "location", "date", "actor", "character"]
entities = model.predict_entities(text, labels, threshold=0.5)
df = pd.DataFrame(entities)
unique_labels = df['label'].unique()
colors = {label: f'background-color: rgba({randint(0,255)},{randint(0,255)},{randint(0,255)},{np.round(np.random.uniform(0.1,0.4), 2)})'
for label in unique_labels}
def color_rows(row):
color = colors.get(row['label'], '')
return [f'{color}' for _ in row]
df.style.apply(color_rows, axis=1)
start | end | text | label | score | |
---|---|---|---|---|---|
0 | 55 | 61 | Trilby | character | 0.974547 |
1 | 142 | 148 | Moscow | location | 0.911368 |
2 | 177 | 198 | January 25/February 6 | date | 0.739128 |
3 | 234 | 238 | 1870 | date | 0.565082 |
4 | 243 | 249 | Moscow | location | 0.924236 |
5 | 255 | 271 | Polina Karpakova | actor | 0.926855 |
6 | 275 | 281 | Trilby | character | 0.986574 |
7 | 286 | 299 | Ludiia Geiten | actor | 0.897300 |
8 | 303 | 310 | Miranda | character | 0.798976 |
9 | 401 | 420 | January 17–29, 1871 | date | 0.906363 |
10 | 424 | 438 | St. Petersburg | location | 0.915059 |
11 | 444 | 458 | Adèle Grantzow | actor | 0.933230 |
12 | 462 | 468 | Trilby | character | 0.990543 |
13 | 473 | 483 | Lev Ivanov | actor | 0.934771 |
14 | 487 | 500 | Count Leopold | character | 0.854538 |