Based in Seattle DefinedCrowdThe company, which calls itself an “intelligent” data curation platform, announced today that it has raised $ 50.5 million in equity capital. CEO and founder Daniela Braga says the proceeds will be used to expand the company's existing solutions, launch subscription-based offerings and expand DefinedCrowd's international reach.
Training AI algorithms usually requires high quality labeled data, which is why it takes almost as long – and often longer – to create the corpora than to develop the models that incorporate it. DefinedCrowd wants to solve this problem with a tailor-made model training service for customers in the areas of customer service, automotive, retail, healthcare and other companies.
Braga, who holds a Ph.D. is familiar with the advantages and disadvantages of data set curation in language technology. Before founding DefinedCrowd, she was responsible for $ 14 million to improve Microsoft's AI-supported Cortana voice assistant, which she described as a tough fight. About 18 months of each product development cycle was spent on data collection to update the underlying models.
DefinedCrowd's approach employs, through Neevo, a community of more than 290,000 contributors (compared to 45,000 two years ago) in 195 countries that do paid jobs with tagging, typing, and spoken words and phrases. They deliver well over 500,000 samples a day to the data sets available with DefinedCrowd's natural language processing, speech recognition, and computer vision tools.
DefinedCrowd customers can use APIs and a web interface to filter demographic data, indicating the age, location and gender of Neevo's members and their language skills for applications such as transcription, conference of language emotions, text mood and semantic annotation as well as collection of questions and answers and spontaneous language . The platform supports over 50 languages and 79 dialects, or around 90% of the world's most spoken languages, with a claimed labeling accuracy of up to 98%.
The true value proposition of DefinedCrowd is probably the expandability. Customers can use the platform not only to train models from scratch within budget constraints, but also to expand existing models with data sets that are tailored to specific technical requirements. Users with simpler requirements can use special workflows, templates, and standard solutions, or upload their own proprietary records while receiving live cost estimates and a dashboard to see real-time progress.
For example, developers of an ability to curate messages on Amazon's Alexa platform could use DefinedCrowd to generate multiple records and improve the performance of the algorithm in different markets.
DefinedCrowd, which saw sales grow 656% year over year last year, includes Fortune 500 companies such as BMW, Mastercard, Nuance and Yahoo Japan. The company employs over 100 people and has offices in Portugal, Seattle and Japan. DefinedCrowd plans to double the workforce to 500 and open additional research and development laboratories by 2021.
In this final round, DefinedCrowd rose from a $ 11.8 million increase in July 2018 to $ 63.4 million, including participation from new investors Semapa Next and Hermes GPE. Existing investors, Evolution Equity Partners, Kibo Ventures, Portugal Ventures, Bynd Venture Capital, EDP Ventures and IronFire Ventures also participated. They joined long-term supporters, including the Amazon Alexa Fund, the Sony Innovation Fund and Mastercard.
It's worth noting that DefinedCrowd isn't the only startup vying for a piece of over $ 5 billion Data annotation tools market. There is Scale AI, which recently raised $ 100 million for its extensive suite of data labeling services, and CloudFactory, which raised $ 65 million last November for its data processing and preparation tools. Not to mention Mighty AI, Hive, Appen and Alegion.