Named Entity Recognition (NER):
This is a subtask of information extraction that involves locating and classifying named entities in unstructured text into predefined categories.
Examples of categories:
Person names, organizations, locations, dates, quantities, and monetary values.
How it works:
NER systems are trained to recognize these entities and sort them into a helpful classification system.
Applications:
It helps businesses and organizations sift through large volumes of text more efficiently.
Python Code using nltk
import nltk
nltk.download("averaged_perceptron_tagger")
nltk.download("punkt")
nltk.download("maxent_ne_chunker")
nltk.download("words")
text="Apple Inc. was founded by Steve Jobs and Steve Wozniak in 1976."
# tokenize the text
tokens=nltk.word_tokenize(text)
print(tokens)
# apply pos tagging
pos_tag=nltk.pos_tag(tokens)
named_entities=nltk.ne_chunk(pos_tag)
# print named entity
for entity in named_entities:
if isinstance(entity,nltk.Tree):
entity_name=" ".join(word for word,pos in entity.leaves())
print(entity_name)Using Spacy
import spacy
nlp=spacy.load("en_core_web_sm")
text="Apple Inc. was founded by Steve Jobs and Steve Wozniak in 1976."
doc=nlp(text)
for entity in doc.ents:
print(f"{entity.text} ({entity.label_})")
Leave a Reply