Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Additional tools that may prove useful

borrowed from http://rawpatentdata.blogspot.com/2013/01/datamining-and-entity-resolutions-some.html

...

Developed in JAVA, can be downloaded from:  http://sourceforge.net/projects/oysterer/

...

OPENCALAIS

  • Open Calais
  • Agrotagger

...

Calais Web Service by Thomson Reuters. The web service is an API that accepts unstructured text (like news articles, blog postings, etc.), processes them using natural language processing and machine learning algorithms, and returns RDF-formatted entities, facts and events.
OpenCalais supports three types of entity disambiguation: Company disambiguation, Geographical disambiguation and Product (Electronics) disambiguation.
Disambiguation of company names - such as determining whether the company Olympus refers to Olympus Optical Co. Ltd. or Olympus Life and Material Science Europa. The resolution output for a given company mention includes:
  • A URI that is unique and uniform across documents
  • The formal English legal name of the company
  • The company's ticker symbol (for public companies)
For company names that cannot be disambiguated, the returned results will include no resolution information.

AgroTagger

Used for indexing information resources, Agrotagger is a keyword extractor that uses the AGROVOC thesaurus as its set of allowable keywords. It can extract from Microsoft Office documents, PDF files and web pages.
Agrotagger began as a collaboration with Indian Institute of Technology of Kanpur (IITK) in 2010. Building on top of the popular Keyword Extraction Engine (KEA) the team created several versions, some based on a reduced subset of AGROVOC known as AGROTAGS (produced by partner ICRISAT) and others using the full set of AGROVOC concepts.