This site aims to show the text analytics that takes place behind the scene of the Dynamic Semantic Publishing platform. It gives you a live experience with the Ontotext tagging service, where you can enrich any content you like, just by pasting a URL or a piece of text and clicking the annotate button. Based on data from DBpedia and WikiData., and smart machine learning algorithms, it recognises mentions of entities such as Person, Organisation, Location, keyphrases, and relationships between them, as well as their relevance and confidence to the text. The site also provides a RESTful API to integrate the tagging service in your own system.

Datasets behind the tagging service

The tagging service uses a dataset of Person, Location, Organization, which are extracted from DBpedia 2015 and WikiData. All abstract DBpedia articles, links to thumbnails and images are loaded in the GraphDB database, and for each entity labels and specific properties are heuristically chosen among the candidates in the combined datasets.

The dataset contains 3.75M entities, more than 10 000 000 individual properties, described with around 55M explicit statements in GraphDB, as follows:

    • 1 245 237 - People
    • 905 198 - Locations
    • 262 030 - Organizations
    • 86 318 - Events
    • 549 938 - Pieces of Work
    • 51 568 - Plants
    • 212 349 - Animals
    • 446 846 - Things

Algorithms behind the tagging service

The tagging service comprises of the following components:

  • named entity disambiguation classifier - recognises the right "candidate" among overlapping annotations produced by the gazetteer and maps it to the right class and instance URI in the knowledge base
  • named entity tagger - detects novel entities that are not present in the knowledge base and assigns them a generated class and instance URI
  • relation extraction rules - a set of rules which detects relationships between named entities
  • document classifier - coming soon :)
