Abstract | ||
---|---|---|
In this paper, we ask what properties makes a large corpus more or less
useful. We suggest that size, by itself, should not be the ultimate goal of
building a corpus. Large-scale corpora are considered desirable because they
offer statistical stability and rich variation. But this rich variation means
more factors to control and evaluate, which can limit the advantages of size.
We discuss the use of multi-channel data to complement large-scale speech
corpora. Even though multi-channel data may limit the scale of a corpus (due to
the complex and labor-intensive nature of data collection) they can offer
information that allows us to tease apart various factors related to speech
production. |
Year | Venue | Keywords |
---|---|---|
2010 | Clinical Orthopaedics and Related Research | speech production,data collection |
Field | DocType | Volume |
Data collection,Ask price,Information retrieval,Computer science,Artificial intelligence,Natural language processing,Speech production | Journal | abs/1012.2 |
Citations | PageRank | References |
1 | 0.37 | 3 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Greg P. Kochanski | 1 | 215 | 19.97 |
Chilin Shih | 2 | 392 | 68.16 |
Ryan Shosted | 3 | 7 | 2.57 |