Document Imaging
-
I'm looking for some advice from someone who has experience with a document imaging system. The documents need to be scanned, given a number, and saved to the file system, and then retrieved for viewing over an API.
What is the best image format to use for a Document Imaging project? Over the lifetime of the project (15 years) I would estimate 10 million images and I'd like to fit that on 2 or 3 TB of hard drive. We would start with a 500 GB drive. The images will be a mixture of filled paper forms, some handwritten, and trip expense receipts.
Some of the documents will be multi-page. Would it be reccommended to store these in a multi-page .TIFF? Or to store them as single images
Thanks for any possible guidance.
-
Either TIFF or PDF gives you a nice standard format that isn't going away.
You should consider a database rather than a file system if you need any sort of meta-data/security/backup plan etc.
-
Agree with LoztInSpace. I would go with PDF because you can put multiple pages in one document. Someo might suggest to put the actual file in the database via a BLOB, but I'd have to know more about how the files are accessed to know whether a datablob would benefit you any.
-
@belgariontheking said:
Agree with LoztInSpace. I would go with PDF because you can put multiple pages in one document. Someo might suggest to put the actual file in the database via a BLOB, but I'd have to know more about how the files are accessed to know whether a datablob would benefit you any.
/agree with both with one minor specific: if your not running it through OCR and are saving a single page and considering PDF then also consider saving it as a JPEG as all images in a PDF are saved in JPEG format anyway so you can skip the PDF encapsulation for singular page files and have more control over final output (quality vs size).