Analysts are often interested not just in the underlying records but in the people, companies and other business concepts that they reveal. These concepts are called Entities.
We offer plug-and-play datasets, which include both the raw filings as well as specific clustered Entities, so that clients can use the data without having to manually discover them.
Entities are difficult to extract from millions of semi-structured records and require complex methods to identify and group. We use advanced natural language and machine learning methods to extract entities. We do this by finding and scoring related entities using complex string evaluation, shared employees, shared agents, shared publications, geolocation and many other signals that are absent in other datasets.
They expose connections, networks, and trends that are otherwise difficult to find. Currently the entities we focus on are:
People
While social and business network sites provide profiles of individuals, they are inherently subjective. Furthermore, they are often weak in revealing business relationships. By mining our database of over 20 million people, we not only provide objective, unbiased profiles of prolific inventors and prized professionals, but also reveal valuable business relationships that others would have difficulties finding.
Companies
Public companies are required to disclose a lot of information in securities filings, but getting a detailed picture of their non-financial workings is still tough. For private companies, it’s even harder. To solve these transparency issues, we mine a wide range of business data to discover things like the subsidiary relationships, brands, technologies, employees, political affiliations and partners of over 28 million public and private companies.
Technologies
Keeping up with the thousands of new inventions registered every day requires monitoring hundreds of sources. Finding relevant information requires a detailed understanding of outdated ontologies. We use real-time updates, natural language processing and dynamic ontologies to perform targeted analysis and monitoring of technology development and trends. We link this information to our company and people database to reveal the people and companies that drive a particular technology.
Brands
Brands are increasingly international, permeate numerous products and services, and are deployed through an broad array of platforms - all of which renders tracking a particular brand a daunting task. We use our trademark database, cross-referenced with other data types, to create an automated global brand tracking system that is able to find corporate affiliations and potential brand conflicts that manual or less expansive monitoring tools might miss.
Locations
Government filings contain millions of addresses. Unfortunately many of these addresses are unusable in their raw format due to incomplete, unstructured and unresolvable entries. To make them more useful, we have organized millions of addresses by adding structure, resolving them to longitude and latitude, deduping and then assigning them to companies and people.