Jump to content
Great War Forum

Remembered Today:

Pamela655

Got some old diaries to convert to digital

Recommended Posts

Pamela655

Hi,

I am a history major student and I need a help. I was able to obtain some very old and rare photographs of the war era and a few diaries of the soldiers. I am so excited to find them. The relics are degrading and I don't want to lose these valuable part of our history.

 

I want to convert these to digital. The diaries are fragile and I don't want to cause any damage trying to digitize them. My uncle told me to see the document imaging and scanning service in Toronto if I want to scan in high quality. I learned that OCR scanning is better at scanning the books. I will be able to get the library quality images of the pages and images. But, I don't know how costly would that be? Do I need to seek a professional service? My friend is having a personal canon color scanner. Using that is the other option.

 

Please advise me what I should do.  

Share this post


Link to post
Share on other sites
4thGordons

Hello- welcome to the forum.

 

OCR typically does not work very well at all with handwriting so that is probably not going to be of much use.

The trouble with flatbed scanners is it requires the documents to be flat -- which is sometimes difficult if they are bound or in notebooks.

A professional scanning service will no doubt be able to offer advice and a means of doing it -- but this will likely come at a fairly significant price.

 

I have found (and I have done a number of diaries and lots of letters) that the quickest and easiest way to record the documents in a form from which they can be transcribed is to PHOTOGRAPH them using a digital camera. This is also how many archives copy documents these days.

 

Ideally you will want good, diffuse light (natural light or artificial) and the camera mounted on a tripod or copy stand so you can open the document under it and hold it as flat as you are able. This will, if you spend some time setting it up, allow you to get high quality copies that - with modern digital cameras - are typically higher resolution than most printers. It also allows you to copy relatively rapidly and to prevent stress on the original documents. If you have assistance, one person can operate the camera (and check focus) whilst the other turns the pages.

Sometimes (if the writing is faint) digital cameras have difficulty focusing on a flat page (as there is little contrast) - what I do is get an plain index card and mark a thick black cross on it in marker pen and then lie this on top of the page - the camera can then easily focus on this and it can be removed (while focus is maintained) before taking the image (because it is paper thin literally) it has no visible impact on the focus.

 

I have photographed numerous diaries and documents in this fashion and it produces high quality results which are easily manipulated for transcribing etc.

It is not necessary to have a high end camera for this

Good luck.

Chris

Share this post


Link to post
Share on other sites
Maureene

My daughter-in-law has an app on her iPhone which has greatly improved the quality of images of documents photographed. I saw a ‘before’ and ‘after’ example, and the image went from being unreadable to  good.

Details: Scanner Pro - PDF document scanner app with OCR

 By Readdle

https://itunes.apple.com/us/app/apple-store/id333710667?mt=8

 

 

I don’t have it myself, so can’t give further details.

 

Cheers

Maureen

Edited by Maureene

Share this post


Link to post
Share on other sites
dink_and_pip

Camscanner is the one I use on android. 

Andy. 

Share this post


Link to post
Share on other sites
Derek Black

Hi,

 

What regiments are represented in the diaries you have collected?

 

Dheers,
Derek.

Share this post


Link to post
Share on other sites
David Filsell

I ask out of total ignorance, but Is it possible to use the copies as a word document?

Share this post


Link to post
Share on other sites
David_Underdown

One of the problems is making sure you don't put too much pressure on the binding.  As others have mentioned flat bed scanners are not a good choice as you will have to make the book as flat as possible which is exactly what is likely to cause damage. I've seen some online video instructions for making a simple book rest from a cardboard box, you can then photograph pages individually, but you do then need to make sure the camera is set up parallel to the page so it all stays nicely in focus.  Some sort of lighting would be good, ideally a LED desk lamp or similar as those don't give out much heat.  "Snake weights" are useful for holding the pages down.

 

@David Filsell you can't easily generate text from handwritten material (well, it's coming along, but still needs a lot of training for an individual hand), so the copies are just an image file and would need to be manually transcribed.  For printed material OCR (optical character recognition) is more advanced and produces reasonable text output, though would almost certainly still need some tidying up.

Share this post


Link to post
Share on other sites
MBrockway

Surely as a history major student you should be telling us how to conserve & digitise historical source material?

 

It might not be explicitly included in an undergraduate syllabus, but your faculty must have significant expertise in this key area.  Senior faculty members would not only be likely to be very interested in the material anyway, they would also be very aware of its conservation requirements.

 

Which school are you attending? What does your professor advise?

Share this post


Link to post
Share on other sites
4thGordons
3 hours ago, David Filsell said:

I ask out of total ignorance, but Is it possible to use the copies as a word document?

David Underdown has answered this but I wanted to add - once you have the images they can of course be pasted into a word document.

I have actually done this quite a bit. I paste the image of the original document at the top of the page and then type the transcription underneath it. This saves having two programs open (which is sometimes unwieldy) and means you have both the original and the transcription present in a single document which is sometimes useful for checking the latter or if there are ambiguities or difficult to read sections.

 

Vaguely related: as an alternative to transcription by typing out the documents I have recently been experimenting with the voice recognition / dictation software that is built in to Win 10 and it has proven remarkably reliable once trained -- at least as reliable as OCR in my experience so far. What it allows you to do is simply read the document and it turns it into text (as in dictation). The big advantage over OCR of course is if you can READ the handwriting then you can turn it into text.  I have played with this quite a lot recently and after about 20 minutes "training" the program to your voice, reading it out converts it into a word document with impressive (although not perfect) accuracy.  This might provide another means of recording the content of the diaries in the original post. But I would also photograph them so there is a record of the original for double checking etc.

 

Chris

Share this post


Link to post
Share on other sites
Norrette
25 minutes ago, 4thGordons said:

What it allows you to do is simply read the document and it turns it into text (as in dictation).

 

Brilliant.  Never thought of it.

Share this post


Link to post
Share on other sites
Dragon

As part of recording an inherited collection, we had to scan over 80 albums each consisting of dozens of pages with delicate contents. We invested in a Fujitsu ScanSnap SV600 which has page-flattening technology (the book does not have to be physically flattened to achieve an excellent image: the software can easily cope with uneven pages and distortion such as at the spine of the pages - I have hundreds of pages of proof of that). It also has intelligent correction among many other useful features such as detecting when you've turned a page and starting scanning immediately. (It removes any signs of your fingers.) The video in the link shows the technology in action. I can fully recommend this overhead scanner because it made what seemed like an overwhelmingly daunting task much more straightforward.

 

I realise this may seem quite an expensive option, but we decided that our time also has a price and we now have the equipment for the future.

 

Following from Mark's point about exploring your faculty's expertise, I would inquire whether the university has such equipment because it is very useful in digitising material. 

 

Gwyn

Edited by Dragon

Share this post


Link to post
Share on other sites
keithmroberts
On 2/9/2017 at 14:09, 4thGordons said:

Vaguely related: as an alternative to transcription by typing out the documents I have recently been experimenting with the voice recognition / dictation software that is built in to Win 10 and it has proven remarkably reliable once trained -- at least as reliable as OCR in my experience so far. What it allows you to do is simply read the document and it turns it into text (as in dictation). The big advantage over OCR of course is if you can READ the handwriting then you can turn it into text.  I have played with this quite a lot recently and after about 20 minutes "training" the program to your voice, reading it out converts it into a word document with impressive (although not perfect) accuracy.  This might provide another means of recording the content of the diaries in the original post. But I would also photograph them so there is a record of the original for double checking etc.

 

I'm interested in this option as I have a 600 + page set of great war material in cursive script to transcribe.  My first experiment with the Microsoft dictate plug in wasn't encouraging, so maybe I need to persist. Google have a free package as well and I wondered if anyone had tried it. At first glance it looked more complete, although initially everything has to be saved into Google Docs.

 

Keith

Share this post


Link to post
Share on other sites
4thGordons

Keith,

I have not used the Google product (but i know some of my students did for a project so I will ask them) I did persist with the Windows dictation beyond the point above. It continued to be promising. I would certainly consider using it if I have a lot of material to transcribe. I tried to use it for some documents containing lots of Proper Nouns and it did not do very well at all with place names etc  (although it did produce some amusing results) I do think "training" the software is important, (and learning the correction shortcuts) - what I will say it is massively better than my institutions speech to text "voice message" software which seems to specialize in leaving utterly unintelligible gibberish!

Chris

Share this post


Link to post
Share on other sites
keithmroberts

Thanks Chris - it will be a few weeks before I have all the documents photographed and indexed, so I have some time to try the training. 

 

The material is a set of five octavo notebooks, each of about 120 pages compiled by the sister of a curate who served in   a small Portsmouth mission church throughout the war. His sister accompanied him on some parish visits in I think early 1915, and reported on those. often the visits were to recently bereaved naval widows,  and the bulk of the material is her transcription into the notebooks of letters to the curate, from servicemen both army and navy from around the world, plus a few unpublished letters from  a few of his fellow curates who were serving as chaplains.

I'm hoping to persuade the owner of the documents, an Oxford college,  to allow some form of publication, but working at the transcription will help me to get a much better feel for the people of the area as well as the men who served.

 

Any feedback on the Google product will be interesting, although as the layout of these notebooks is fairly simple I am feeling more positive about the Microsoft  plugin in the light of your comments.  Thank you for taking the trouble to reply.

 

Keith

Share this post


Link to post
Share on other sites
JWK

There's also  https://transkribus.eu/Transkribus/

A program still being developed by the University of Innsbruck along with many scholars and history buffs and volunteers.

 

Apparently this program learns from experience: you feed it a scan of a page of handwritten text, and your transcription of that page, and then (providing the following pages are in the same handwriting) it *should* transcribe the following pages on its own.
 

It's all still in baby/infant stage, but maybe worth a shot?

If it doesn't work than that in itself is valuable information for the developers.

Share this post


Link to post
Share on other sites
David_Underdown

You need to have transcribed more than one page for the initial model to be much use, from memory they say minimum 20 pages.

 

TNA have been trying it out http://blog.nationalarchives.gov.uk/blog/machines-reading-the-archive-handwritten-text-recognition-software/

Share this post


Link to post
Share on other sites
keithmroberts

JWK and David

 

Thank you for your comments. I won't be starting my modest transcription for a few weeks yet.  My inclination on balance is to spend a day or more trying to train some dictation software, but if I don't achieve a decent result with that, I might well try transcribus. It certainly looks to be a major tool in the making, and maybe would handle my challenge fairly well, given that the 600 or so pages are all in a single fairly consistent script from the early part of the last century.

 

Keith

Share this post


Link to post
Share on other sites
keithmroberts

This  is looking both interesting and challenging. it seems that there are two dictation options provided free by Microsoft. One is built into the latest versions of Windows 10, while the other is described as a Microsoft Garage plugin for Word. So far I have not found anywhere on the web that compares the two. I completed photographing the manuscript documents some weeks ago, and will have to get started on the transcription soon. Some mature posts found via search engines suggest that I need to commit a reasonable amount of time to training the software to recognise my voice. I think it was the Windows version that Chris referred to in post #13 rather than the Word plugin.

I'll get there eventually.

 

keith

Share this post


Link to post
Share on other sites
4thGordons
On 5/12/2018 at 07:44, keithmroberts said:

I think it was the Windows version that Chris referred to in post #13 rather than the Word plugin.

I'll get there eventually.

 

keith

 

Yes, I was using the Win10 version.

Chris

Share this post


Link to post
Share on other sites
keithmroberts

Just starting out to see how  effectively I can train it, and myself.

 

keith

Share this post


Link to post
Share on other sites
Rachd

I look forward to hearing how it goes Keith.

 

I am about 800 pages (105,000 words) into my transcription of the papers of the Rev. Bells with still about 400 pages to go!

 

Neil

Share this post


Link to post
Share on other sites
stiletto_33853
On 2/9/2017 at 16:24, Dragon said:

As part of recording an inherited collection, we had to scan over 80 albums each consisting of dozens of pages with delicate contents. We invested in a Fujitsu ScanSnap SV600 which has page-flattening technology (the book does not have to be physically flattened to achieve an excellent image: the software can easily cope with uneven pages and distortion such as at the spine of the pages - I have hundreds of pages of proof of that). It also has intelligent correction among many other useful features such as detecting when you've turned a page and starting scanning immediately. (It removes any signs of your fingers.) The video in the link shows the technology in action. I can fully recommend this overhead scanner because it made what seemed like an overwhelmingly daunting task much more straightforward.

 

Gwyn

Hi Gwyn, Thank you for that precis on the Fujitsu. I had been looking at something along these lines due to the amount of very old and sometimes delicate books, photo albums in my library. Sounds like a good investment.

 

Andy

Edited by stiletto_33853

Share this post


Link to post
Share on other sites
David_Underdown

Not sure how widely available this is yet, but particularly if you already have an Android phone, ScanTent may be worth a look https://scantent.cvl.tuwien.ac.at/en/

Share this post


Link to post
Share on other sites
keithmroberts

I have made a small start with the Windows 10 product. My experience is mixed - but probably more training of the voice recognition itself , and hopefully increasing confidence with the main commands will lead to improvement. So far my biggest grouse is with the "Correct" command, which flashes up alternatives and vanishes before I can read the options, let alone react. It is loaded on a fast  laptop and I can see no way to slow features like that down. Still, it is much faster than my typing. I have an older less pacy  machine declared redundant by my son, but am reluctant to lash out for an extra 365 subscription just to test the software on a slower machine. perseverance is everything I hope.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...