Document Imaging



  • I'm looking for some advice from someone who has experience with a document imaging system.   The documents need to be scanned, given a number, and saved to the file system, and then retrieved for viewing over an API.

     

    What is the best image format to use for a Document Imaging project?  Over the lifetime of the project (15 years) I would estimate 10 million images and I'd like to fit that on 2 or 3 TB of hard drive.  We would start with a 500 GB drive.  The images will be a mixture of filled paper forms, some handwritten, and trip expense receipts. 

     

    Some of the documents will be multi-page.  Would it be reccommended to store these in a multi-page .TIFF?  Or to store them as single images

     

    Thanks for any possible guidance.



  • Either TIFF or PDF gives you a nice standard format that isn't going away.

    You should consider a database rather than a file system if you need any sort of meta-data/security/backup plan etc. 



  • Agree with LoztInSpace. I would go with PDF because you can put multiple pages in one document.  Someo might suggest to put the actual file in the database via a BLOB, but I'd have to know more about how the files are accessed to know whether a datablob would benefit you any.



  • @belgariontheking said:

    Agree with LoztInSpace. I would go with PDF because you can put multiple pages in one document.  Someo might suggest to put the actual file in the database via a BLOB, but I'd have to know more about how the files are accessed to know whether a datablob would benefit you any.

    /agree with both with one minor specific: if your not running it through OCR and are saving a single page and considering PDF then also consider saving it as a JPEG as all images in a PDF are saved in JPEG format anyway so you can skip the PDF encapsulation  for singular page files and have more control over final output (quality vs size).


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.