An experiment in collaborative genealogy

While making my plans for the upcoming IAJGS International Conference on Jewish Genealogy in Warsaw, I came up with an experiment I’d like to try. This experiment needs dozens, if not hundreds, of volunteers to pull off successfully.

The short version is I’m organizing volunteers to photograph and geocode all the gravestones in the Okopowa St. Cemetery in Warsaw, and then upload those images to both BillionGraves as well as to special groups on Flickr when they will become available to everyone to use.

There are probably over 80,000 gravestones in the cemetery, and while I don’t expect we’ll be able to get to all of them by the time the conference ends, the simple effort to do so will be an incredible experiment in collaborative genealogy.

For full details on this experiment, and how to get involved, please go to the Okopowa St. Project page.

What DPI should I scan my photos, and in what format do I save them?

My lecture Preserving Photographs, Scanning, and Digital Backups at this weeks’ IAJGS International Conference on Jewish Genealogy was well attended with somewhere around 150-200 people. While I can’t post the video of the presentation on my blog, I do want to share some of the information from the lecture here.

The two most common questions I get about scanning photographs are:

1) What DPI do I need to scan my photo?
2) What file format should I save the file in?

DPI stands for dots-per-inch, and refers to how many pixels are present in each inch of the photograph. For example, if you had an 8×10 inch photograph, and you scanned it at 100dpi, you would have a photo that was 800×1000 pixels, or 800,000 pixels altogether. That’s less than a million pixels, or another to say it is it is less than a megapixel. Doubling the DPI to 200dpi, gives you 1600×2000 pixels, or 3,200,000 pixels, or 3.2 megapixels. Note that doubling the DPI effectively quadruples the number of pixels, since the dpi increases in both vertical and horizontal directions.

Here’s another way to look at, in a slide from my presentation:

Basically, if you look at scanning photographs (or negatives/slides) you can see that scanning it at 300dpi for different sizes will give you much different size images. I have a rule-of-thumb that I use to determine the correct DPI to scan at, and basically it has to do with figuring out the largest size you want to be able to print (printing is usually done at 300dpi) and then adjust your scanning dpi to insure you’ll have enough pixels to print. Here’s the summary:

For people reading this on a small screen where the image is hard to read, the basic rule is:

Minimum resolution (DPI) should be the number of inches of the largest side you want to print, divided by the largest side in inches of what you’re scanning, multiplied by 300.

So if you are scanning a 4×5 print, and want to be able to print at 8×10, you need twice the DPI you’ll print at, so 600dpi. Of course, it doesn’t hurt to scan more than you need, although there are diminishing returns. Not all photographs are high enough quality to give you a better picture when scanned at very high resolution.

A Kodachrome slide supposedly has enough resolution to output about 20 megapixels. That means you can basically max out a 4000dpi slide scanner and get a good result. That said, a small old print with lots of grain probably wouldn’t benefit by going beyond my rule of thumb, and some likely could be safely scanned at a lower resolution.

Storage is cheap though, so I say scan as high a resolution as you want, and use my rule of thumb as the minimum guideline.

So once you’ve figured out what resolution to scan in, what format should you save it in?

The short answer is TIFF. TIFF was actually designed early on for the purpose of scanning photographs. TIFF also, for the most part, does not lose any data in the file format, unlike formats like JPEG which always compress data in a lossy fashion (I say for the most part because it’s technically possible to use JPEG compression in a TIFF file, but it’s rare, and I doubt any scanner software you would use is going to do that). You can scan to TIFF format using LZW compression that is lossless (i.e. does not degrade the photo quality). TIFF is also good because it is so widely supported, and is used by archives and libraries for their own scanning, and is unlikely to become unsupported by future software.

PNG is also a good format for scanning. It’s a more modern format, and offers built-in lossless compression. It’s not as widely supported, but if space is at a premium, it might save you a bit over TIFF.

JPEG is not a good format for scanning, because it a lossy compression format, and you will always lose some data when saving to a JPEG, even if you save it at 100% quality. I sometimes scan to both TIFF and JPEG, as JPEG can be easier to share sometimes, but I am sure to have the TIFF file as well.

PDF is not a good format to scan photographs with, as you have no control over how images are compressed, and editing them is much more difficult than TIFF or PNG. In general, PDF files will actually use JPEG compression anyways, without being able to even set the quality. If you’re scanning a multi-page printed document, you can use PDF as a convenient way of sharing it, but if there are photos and other important content in the document, I would suggest scanning it as a TIFF as well. It’s not well known, but TIFF also supports multi-page documents, just like PDF.

If you have additional questions about scanning photographs, please post them in the comments below.

Some thoughts on day one of #IAJGS2015

Yesterday was day one of the IAJGS International Conference on Jewish Genealogy in Jerusalem. The last time the annual conference was in Jerusalem was 11 year ago, and that was the first genealogy conference I attended. Back then I was less involved in genealogy, although I did volunteer to put together the Souvenir Journal given out to attendees (still available online).

It’s interesting to look back at that journal and notice that there are two letters of approbation at the beginning of the journal, written by then Israeli President Moshe Katsav and then Jerusalem Mayor Uri Lupolianski. Lupolianski was, if I recall correctly, the keynote speaker that year as well. What’s interesting about those two Israeli leaders is that both currently sit in jail – Katsav on rape and sexual harassment charges, and Lupolianski on corruption charges. I suppose that should be disheartening, although on the plus side I guess we can be confident that no one is above the law here in Israel.

This year the keynote speaker was Rabbi Yisrael Meir Lau, the former Chief Rabbi of Israel (and father of the current Chief Rabbi). I think he’s above reproach, perhaps the conference organizers wanted someone they could be sure wouldn’t be in jail by the next conference here (although interestingly enough, the rabbi that served for 10 years in between Rabbi Lau and his son is currently in jail, on corruption charges).

Rabbi Lau’s story is quite amazing (I highly recommend his autobiography Out of the Depths), being one of the youngest survivors of the Buchenwald concentration camp (I believe he was 8) and ending up in Israel where he rose to be Chief Rabbi. He is a charismatic speaker, and I think well received by everyone there.

Earlier in the day I spoke on the topic of Jewish Names, Red Herrings and Name Changes. I was happily surprised to be talking to a packed room.

Speaking at IAJGS 2015
(Thanks to my friend Jay Solomont for taking this photo)
I hope to publish a summary of my lecture sometime in the future. I made one mistake in my lecture, but I’m glad no one picked up on it…

I also attended several lectures which were excellent. One lecture which I almost missed, but was happy to find at the last minute, was a lecture titled Who Were The European Jewish Refugees in Casablanca During World War II and How Did They Get There? given by Michal Ben Ya’akov. The lecture centered around the work done by Helene Cazes Benatar, a lawyer working in Morocco during WWII, who helped Jewish refugees who flowed into the country from Europe.

Among the thousands of refugees in Morocco during the war was my grandmother, and six other female relatives who fled from France on a banana boat, ending up in Mogador, as the current city of Essaouira, Morocco was known at the time, when it was still a French protectorate. The story of my family’s stay in Morocco has always been something I’ve wanted to investigate further, and I hope to be able to find information on my family in the records Ben Ya’akov discussed in her lecture.

Another interesting lecture was on the topic of prenumeranten. The lecture was given by Yehuda Aharon Horovitz, and he discussed the topic in great detail. In the past, I’ve described prenumeranten as the Kickstarter of the 19th century publishing business. While writers today can crowdfund their books in advance online, the model is actually quite old, although it was much more labor intensive. Back then, a writer might travel from town to town and take advance payments on books, to enable him to have funds to do the work and to print and mail the books out. This was a very popular method for writers of Jewish religious books. In some cases, the author himself wouldn’t travel to collect money, but rather he would hire someone as a middle-man to travel from town to town, collect money, and then bring him the list and the money (after taking a cut obviously). This isn’t so different than what happened in the secular world, where salesmen might travel from town to town selling encyclopedias. Dictionaries were sold in advance and volumes mailed out as they were completed. In the case of the Jewish books, however, the list of pre-purchasers would be printed in the first edition of the book, usually sorted by town. These name lists might sometimes be the only lists of names that exist for a given town at a specific time, and thus can be very useful.

A fellow researcher, whose shares cousins with me, exposed me to prenumeranten a couple of years ago. RYT, as I’ll call him, since I’m not sure he’d want his name online, did some amazing work finding relatives of ours in these prenumeranten. I’ve intended to write an article just about his work at some point, and now seeing this lecture has given me the encouragement to get back into that research. I learned about new resources that I wasn’t aware of, and am hopeful that Horovitz’s goal of getting all prenumeranten lists scanned and indexed, and made searchable online, will come to fruition.

I’m looking forward to the rest of the conference.