Chipola: A Chinese Podcast Lexical Database for capturing spoken language nuances and predicting behavioral data

database[Title] 2025-05-11

Summary:

This study introduces Chipola, a Chinese Podcast Lexical Database derived from a large-scale collection of Chinese podcast transcripts. Due to the spoken nature of podcasts, such a podcast lexical database can accurately capture the nuances of spoken language in Chinese. Chipola was developed based on a corpus that comprises 31.2 million word tokens and 41.7 million character tokens, featuring a vocabulary of 88,085 unique words and 4,613 unique characters. Lexical variables such as frequency,...

Link:

https://pubmed.ncbi.nlm.nih.gov/40341999/?utm_source=Other&utm_medium=rss&utm_campaign=pubmed-2&utm_content=12QQbiNmM99eUQGIX1JjHIKcROC1Vzv4sOS-2S_LNI19uG_Yrk&fc=20220129225649&ff=20250511144426&v=2.18.0.post9+e462414

From feeds:

📚BioDBS Bibliography » database[Title]

Authors:

Ning Zhao, Lei Lei

Date tagged:

05/11/2025, 14:45

Date published:

05/09/2025, 06:00