Api·Go

Speech

Audio files with corresponding timestamped transcription for applicationssuch as automatic speech recognition, language identification, and voiceassistants.

Key features:

Speech types: scripted (including TTs),conversational, broadcast

Recording types: microphone, telephony (mobile, landline), smartphone

Environments: quiet (home, office, studio), noisy (public place, in-car,roadside)

Audio quality: 8kHz - 96kHz

Text

Tailored, ethically-sourced text datasets that drive smarter insights for more accurate language processing and machine learning models.

Text datasets include:

Pronunciation Dictionaries (Lexicons): 5.4M words in 75 languages

Part-of-speech (POS) dictionaries: 3.2M words in 18 languages

Named Entity Recognition (NER): 344k+ entity labels in 9 languages

Inverse Text Normalization: 36k+ test cases in 7 languages

Image

115k+ images in 14+ languages to develop diverse applications such as optical character recognition (OCR) and facial recognition software.

Featured image datasets include:

15.8K images of documents in 14 languages with mixed premium and challenging conditions for OCR

13.5K human facial images of 99 participants in various lighting conditions, angles, and expressions.

Video

High-quality video data to enhance AI models, like multi-modal LLMs, for tasks such as object detection, gesture recognition, and video summarization.

Featured video dataset:

130 sessions documenting human body movement of 100 diverse participants in the United Kingdom and the Philippines

Multi-camera recordings in several locations with varied background, weather, and lighting conditions.

Location

Precise location data for insights into user movements and interactions with specific points of interest, enabling location-based analytics and targeted strategies.

Accurate GPS signals collected in-app from SDKs

Global: 200+ countries

Compliant: 100% user opt in

Scale: 1.5+ billion devices and 500+ billion events

Off-the-Shelf AlTraining Datasets

Types of Al Training Datasets