Project Details
Description
An estimated zettabytes of data are generated every day, with about 80% of this data being unannotated, unstructured text. An as of yet unsolved problem with this type of data is how to make it useful for AI applications. Manual annotation of the data can be very precise and incorporate domain-specific knowledge, but it is costly, inefficient, and not scalable. The so-called "80/20 rule" refers to the fact that data scientists often spend up to 80% of their time sorting, cleaning, and otherwise preparing datasets. This project aims to develop a novel hybrid framework that helps domain experts annotate text using Natural Language Processing algorithms, reducing the process to a fraction of the time. The hybrid framework will enable data scientists to create customized, domain-specific datasets for their AI applications in a short time. Especially small and medium-sized companies with only a few employees are thus supported in the development of their own AI applications.
Short title | CreateData4AI |
---|---|
Acronym | CD4AI |
Status | Active |
Effective start/end date | 1/01/23 → 31/12/25 |
Collaborative partners
- Fusionbase GmbH (lead)
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.