So it’s been two weeks since I last posted, and that’s because I’ve been busy on a new project. When Mocavo, a search engine dedicated to genealogy, launched a couple of weeks ago, I was inspired to figure out exactly how they were returning the results they were returning, and how one can create topic-specific search engines.
It was not clear to me exactly how Mocavo collected its results – if for example it uses results from another search engine, and just releases the results that match a pre-set list of genealogy-oriented web sites, or if Mocavo is indeed operating its own search ‘spider’ to crawl the web and collect its own results. It seemed clear that while Mocavo did find good results within a number of major genealogy web sites, it didn’t appear to find results on many minor sites, or on major general web sites that might have small genealogy sections. For example, if someone posted a web page on their family on their own web site, or started a Yahoo Group to discuss a particular town or surname (such as described in my earlier post on mailing lists), it did not seem to appear on Mocavo. I don’t know what algorithm Mocavo uses, but I’m guessing it can’t currently find particular sites within larger general websites like Yahoo, so it ignores Yahoo altogether (to eliminate the chance of false positives).
Not knowing anything about how Mocavo put together their site, I decided to see what I could put together myself. Using tools provided by Google (I suspect Mocavo uses the same tools, just their paid versions that allow them much greater customization) I worked over the past couple of weeks to put together my own genealogy search engine. It is a bit more inclusive in how it determines which sites to search than Mocavo. It is thus more likely to find small genealogy sites, but also more likely to find some less-than-relevant results. That’s a compromise I’ve struck, which I think returns many interesting results than you might not find on Mocavo. Of course, Mocavo has the advantage of being a real company with employees who get paid to update the search results, so they can improve their results over time. As this is not my full-time job, I don’t have that luxury. Don’t think, however, that I’m trying to compete with Mocavo. This is just my own attempt at creating a useful search tool for genealogists, inspired by Mocavo.
Unfortunately one of the downsides of Google’s free search tools is ads. I can’t stop the ads from showing up unless I’m willing to pay Google for that privilege. I don’t know why Google shows ads more aggressively on custom search engines like this one then they do on their own search engine, but they do. I’m sorry about that, but there isn’t anything I can do about it.
When looking through the tools available to me, I tried to figure out how I could improve the results for genealogists. I came up with an interesting idea, but Google restricts how useful it can be. Basically when towns have undergone name changes or have different names in different languages, and a record shows up under a name of a town that is different than the version of the name you are searching, you will not get results. Google will actually help here with major cities, so for example if you search for Wien it knows to search for Vienna, but it does not know every version of every town one might be searching for, nor frankly should it as this technique can actually reduce the usefulness of search results when alternate names overlap. In any event, Google allows you to define synonyms for search terms, but limits the amount you can do.
As I was limited, I had to choose a small area to try this technique out on, and I chose the Galicia region of the former Austro-Hungarian Empire. It is a particularly good region to choose, as it has been controlled by many different countries over time, had many different languages spoken, and most towns have many names. It’s also small enough of a region that it fits within the limits of what Google allows me to do. Part of the problem is that Google only allows uni-directional synonyms, which means you need to know which town name to search or the synonyms won’t kick in. To use a set of names that was easily definable, I’ve chosen to use the names at the top of the Locality Pages from the JewishGen Community Database for the given towns. Basically, whichever name appears at the top of a Locality Page for the given town is the one you should use – except don’t use accent marks or apostrophes in the name. You should use dashes if they are in the name. Obviously it must be a town that was part of Galicia. While this will only help people who are searching with one of the hundreds of Galician towns in their search query, if you are not searching with one of these town names, the search engine will still work well to help you find results among the many sites it does search.
I am open to all feedback on this search engine, and welcome feedback in the comments. Please leave comments on the search page itself, and not on this post, as this post is just an introduction and in the future people will just go straight to the search page.
Without further ado, I introduce B&F Enhanced Genealogy Search.