Title
DiCoMo: the digitization cost model
Abstract
The estimate of digitization costs is a very difficult task. It is difficult to obtain accurate values because of the great quantity of unknown factors. However, digitization projects need to have a precise idea of the economic costs and the times involved in the development of their contents. The common practice when we start digitizing a new collection is to set a schedule, and a firm commitment to fulfil it (both in terms of cost and deadlines), even before the actual digitization work starts. As it happens with software development projects, incorrect estimates produce delays and cause costs overdrafts. Based on methods used in Software Engineering for software development cost prediction like COCOMO and Function Points, and using historical data gathered during 5 years at the MCDL project, during the digitization of more than 12000 books, we have developed a method for time-and-cost estimates named DiCoMo (Digitization Cost Model) for digital content production in general. This method can be adapted to different production processes, like the production of digital XML or HTML texts using scanning and OCR, and undergoing human proofreading and error correction, or for the production of digital facsimiles (scanning without OCR). The accuracy of the estimates improve with time, since the algorithms can be optimized by making adjustments based on historical data gathered from previous tasks. Finally, we consider the problem of parallelizing tasks, i.e. dividing the work among a number of encoders that will work in parallel.
Year
DOI
Venue
2010
10.1007/s00799-011-0073-9
Int. J. on Digital Libraries
Keywords
Field
DocType
digital facsimile,costs overdraft,historical data,digital content production,digitization cost,different production process,actual digitization work,digital xml,software development cost prediction,digitization project,digitization cost model
Data mining,Digitization,XML,Computer science,Function point,Error detection and correction,Economic cost,COCOMO,Digital content,Software development
Journal
Volume
Issue
ISSN
11
2
1432-1300
Citations 
PageRank 
References 
0
0.34
4
Authors
3
Name
Order
Citations
PageRank
Alejandro Bia13310.59
Rafael Muñoz210.70
Jaime Gómez3729.26