Bulk Data - Clusters

Analysts are often interested not just in the underlying records but in the people, companies and other business concepts that they reveal. These concepts are called Entities.

We offer plug-and-play datasets, which include both the raw filings as well as specific clustered Entities, so that clients can use the data without having to manually discover them.

Entities are difficult to extract from millions of semi-structured records and require complex methods to identify and group. We use advanced natural language and machine learning methods to extract entities. We do this by finding and scoring related entities using complex string evaluation, shared employees, shared agents, shared publications, geolocation and many other signals that are absent in other datasets.

They expose connections, networks, and trends that are otherwise difficult to find. Currently the entities we focus on are:


While social and business network sites provide profiles of individuals, they are inherently subjective. Furthermore, they are often weak in revealing business relationships. By mining our database of over 20 million people, we not only provide objective, unbiased profiles of prolific inventors and prized professionals, but also reveal valuable business relationships that others would have difficulties finding.


Public companies are required to disclose a lot of information in securities filings, but getting a detailed picture of their non-financial workings is still tough. For private companies, it’s even harder. To solve these transparency issues, we mine a wide range of business data to discover things like the subsidiary relationships, brands, technologies, employees, political affiliations and partners of over 28 million public and private companies.


Keeping up with the thousands of new inventions registered every day requires monitoring hundreds of sources. Finding relevant information requires a detailed understanding of outdated ontologies. We use real-time updates, natural language processing and dynamic ontologies to perform targeted analysis and monitoring of technology development and trends. We link this information to our company and people database to reveal the people and companies that drive a particular technology.


Brands are increasingly international, permeate numerous products and services, and are deployed through an broad array of platforms - all of which renders tracking a particular brand a daunting task. We use our trademark database, cross-referenced with other data types, to create an automated global brand tracking system that is able to find corporate affiliations and potential brand conflicts that manual or less expansive monitoring tools might miss.


Government filings contain millions of addresses. Unfortunately many of these addresses are unusable in their raw format due to incomplete, unstructured and unresolvable entries. To make them more useful, we have organized millions of addresses by adding structure, resolving them to longitude and latitude, deduping and then assigning them to companies and people.


We investigate and acquire datasets from around the world using automated data retrieval and "deep web" crawling methods.


Our analysts use our existing library of proprietary ETL tools to return clean and structured data in any format your business requires.


Raw data is rarely good enough by itself. We use advanced natural language and machine learning methods to extract entities, from people to addresses, so you can find information, not just data.


At over 200 million records, 1 billion entities and relationships and 88 countries, our existing dataset of companies, people, intellectual property, legal and financial filings make for powerful supplements to your existing data.


Sales | Support | Press | Employment | Partners

US: +1 302-566-5993
Hong Kong: +852 3693-1524
Fax: +1 866-594-4383