Tag Archives: gedcom

Pruning Your Family Tree

Cruft is a term used in computer programming circles to mean the useless code in a computer program that accumulate over time. Cruft is the stuff you added at one point that might have been important then, but is now irrelevant, and worse it causes the rest of your program to slow down. You might have needed, for example, to support what is now an obsolete computer platform at one point, but the code for that shouldn’t still be in your program today.

Family trees also accumulate cruft over time, and just like in computer programs those extra people and extra information can slow you down. There are a number of reasons that bad information can enter your tree, but the most common and most problematic is when you import a GEDCOM from a relative without checking first to see if everyone in the tree is actually related to you. If you get a GEDCOM file from a relative with 2000 people in it and only 200 of them are actually related to you, you’ve just added 1800 that are irrelevant to your tree. Moreover, if you upload your family tree to a site like Ancestry.com or MyHeritage.com where they can do some form of automatic matching between your tree and other trees as well as with records on the site, you’re going to get all kinds of matches for people who are not actually related to you. Following up these false leads is a big waste of time.

I recently uploaded a family tree to one of these web sites and started getting matches to people not related to me. It illustrated to me that when I imported a tree from a relative a couple of years ago I did not properly check out the tree first. When I share a tree with someone, I usually only export those people who are related to the person I’m sending the tree to, plus spouses. This insures the person doesn’t get a lot of records that are not relevant to them. When you receive a GEDCOM from someone else, you should also check it out, create a test file where you import it, add yourself to the file, and then see if everyone in the file is related to you. I obviously forgot to do this with this particular file a few years back, and ended up with about 300 extra people in my tree that I was not related to, which was what was causing these false hits in the matching program (technically they’re not false hits, okay, but from my perspective they’re just as annoying even if they are my fault).

After receiving quite enough of these messages from the web site I decided it was time to remove the incorrect records from my family tree file. While my initial guess about that GEDCOM file was correct, and it was indeed the source of most of the incorrect records, I also discovered something else interesting – that there were other people in my family tree that were not related to me, some of them that I wanted to keep. The important thing here is that while most genealogy programs will let you select all your relatives (and their spouses), it’s not so simple to select your relatives and delete everyone else. The issue of the spouses, by the way, is a simple one. If you only had the program select your actual relatives, your sibling’s spouse would not be chosen. Your sibling’s kids would be chosen, but they would be missing a parent since strictly speaking that spouse is not your blood relative. Thus you need your genealogy program to select spouses as well.

The people I found in my tree that were not related to me fell into a few categories. Most were from GEDCOM imports, with most of those from that one GEDCOM I suspected, but also a few others here and there.

Some of the people were really cruft in that they were small sections that were someone isolated from the rest of the tree. I suspect they were descendants of someone I deleted at some point. They should probably have been deleted a long time ago, but were somehow still in my tree – probably due to a bug in the genealogy program.

Then there were the parents of spouses. I sometimes like to add information on parents of spouses that I add to the tree. This is mainly so that if I want to research the spouse at some point in time, that I know a bit more about them to help me with the research. Knowing the names of a person’s parents can be very important when doing research. The problem, of course, is that if I do a standard selection of people in my tree that are not relatives or their spouses, these parents get left out – yet I still want them in the tree. The solution here is not simple. There is not an automated way to include these people. The answer is probably (and I have not done this yet) to flag those parents in some way. Some genealogy programs let you define custom flags, and then assign them to people. If you carefully check out all the non-relatives in your tree and see which ones you want to keep, you can then flag them for future reference. Each time you add non-relative parents, you can flag them. In the future if you go to prune your tree again, you can do a standard selection of relatives and spouses, and then add the flagged people. Anyone left over can then be removed from your tree.

The Future of Sharing (Genealogical Data)

It’s no secret that the current standard for sharing genealogical data, GEDCOM, is woefully out of date. The last official revision to the GEDCOM standard, 5.5, was completed in 1996. A minor update, 5.5.1, was released in 1999 but never officially approved (even though some of its provisions have been adopted by various genealogy programs). Revision 5.5.1 added one very important feature – support for UTF-8 character encoding, which is a form of Unicode, which support multiple character sets (including, for example, Hebrew).

GEDCOM has, for all intents and purposes, been abandoned by the Church of Latter Day Saints (the Mormons) which created and owns the standard. The church has indicated that they will not be updating it, and indeed are replacing the need for it with a new API (Application Programming Interface) which will allow genealogy programs to exchange data with their website (FamilySearch.org). One problem with this approach is the need to go through their website, and the fact that they have not made this API publicly available (i.e. it’s not a public standard, just a private interface to their web site). Another major problem is that there is no data format that allows one to create a family tree that can be shared independently, like GEDCOM is used today. FamilySearch in no way needs such a format, since their mammoth size and importance in the genealogical world will force genealogy program to support its API, as many have already done.

Over the years, there have been many attempts to either upgrade or replace GEDCOM. These efforts have all failed. In general the problem has been that the companies that create genealogy program need to agree to adopt any new standard, and they really haven’t had much incentive to do so. Supporting the import of GEDCOM files allows them to support a basic file interchange, which never will support the full feature-set of their programs which have become much more sophisticated since 1996, but is enough to allow customers to exchange information with their relatives. If they supported a fully-featured GEDCOM replacement (that for example would better support photographs and evidence management), it would only make it easier for customers to try other programs. Thus the disincentive for the companies to support a modern replacement for GEDCOM.

Another problem with replacing GEDCOM has been arguments over the data model used. GEDCOM is based on a nuclear family data model (i.e. one mother, one father and their children). It assumes a nuclear family structure, and other forms of families are harder to support. This problem has caused some to support a data model based not on the family but on the individual. This is philosophical debate, and as you might imagine different people take very strong positions in this battle.

Even with this history, there are a few new initiatives to come up with a replacement for GEDCOM. One initiative that has garnered some attention recently is BetterGEDCOM. The BetterGEDCOM initiative came from the frustration of many genealogists over the lack of updates to GEDCOM and is an attempt to create an open forum for the creation of a new standard. Like many attempts at ‘openness’, however, it has run into its own in-fighting and conflicts. It remains to be seen how successful this attempt with be. Another recent initiative is the International OpenGen Alliance (OpenGen). This effort is a bit more of a top-down approach, being managed by the company that runs AppleTree.com, an online family tree web site. OpenGen is, however, a non-profit organization that is supposed to include more than just the team at AppleTree. There have been some attempts between BetterGEDCOM and OpenGen to coordinate, or at least follow each others’ efforts closely. Time will tell which effort, if either, will be successful in creating a new genealogical data sharing standard.

In case you think it isn’t complicated enough, other web sites beyond FamilySearch.org are also developing their own APIs for exchanging genealogical data. OneGreatFamily.com last year introduced an API called GenealogyCloud. It seems that no third-party applications yet support this API.

Geni.com, which boasts nearly a hundred million profiles on their site, and nearly 50 million that are interconnected in what they call their World Family Tree, just yesterday introduced their own API. Unlike FamilySearch.org, however, they are releasing documentation and sample applications on their web site. This will allow anyone to write applications that interact with Geni.com, similar to the way Facebook allows outside developers to create application that access information on Facebook. This is a very positive step. It’s not coincidence that one of the other large family tree web sites, AppleTree.com, is pushing another initiative to replace GEDCOM (OpenGen). These large sites need to create ways to exchange data and interact with other programs and web sites in order to maintain their growth rates.

MyHeritage.com, another one of the big family tree web sites, has taken a slightly different approach in that they have their own application (Family Tree Builder) that runs on a computer, which can sync data to their web site. While this approach allows them more control over what modifies data on their platform, it has its shortcomings as well, not the least of which it requires Windows to run (this coming from a Mac user). I suspect that MyHeritage.com will release their own public API in the future, if only to compete with Geni.com, their biggest competitor.

We can always hope that FamilySearch.org, Geni.com, MyHeritage.com and AppleTree.com will all come together and create a single API and data format for sharing data, but unfortunately if the past is any guide, this is unlikely to happen.

One indication of the direction the wind is blowing in this regard will be the upcoming RootsTech conference, taking place in February 2011 in Salt Lake City. This conference is the first RootsTech conference, although according to the organizers it replaces three earlier technical conferences – The Conference on Computerized Family History, the Family History Technology Workshop and the FamilySearch Developers Conference. Note that these previous conferences were all connected in some way to the Mormon church. It’s unclear how open this new conference will be to new ideas, or if it is really only looking for input for the existing Mormon church efforts such as FamilySearch.org. I imagine representatives from most of the genealogy software companies and web sites will be in attendance at the conference, as will people associated with the BetterGEDCOM and OpenGen efforts. During the week of the conference there will probably be a lot of blogging about what is going on, but the real test will be after the conference if companies announce intentions to seek a common API or data format to move forward with, or whether everyone will just continue the same disjointed approach that has been pursued for nearly 15 years.

Multimedia support in FTM for Mac – a bit lacking

Continuing my attempt to transition my family tree from Reunion to FTM for Mac, I wanted to discuss FTM’s handling of image listed in the GEDCOM file.

So first, I like the fact that FTM has a Media tab where you can view all images in your family tree file. That is something I’ve wanted from Reunion for a long time. That said, it seems FTM’s handling of the imported images is a bit sub-par. For starters, even though it has the correct path for each image file, it can’t seem to find them. Reunion exports the standard Mac (and UNIX) file path to each image, which in my case begins with a tilde (~) indicating that the file is in a sub-folder of my home folder. FTM doesn’t seem to know what that means. It lets you either search manually for the file or have FTM search for it. Either option works, but it would take forever for me to do this for each image.

Reunion has one very nice feature when a file goes missing (like if you move it to a different folder) where it lets you find the new location, and then it looks at all the other images that were in the same folder and updates them as well. This is a big timesaver and something FTM should emulate. THis in combination with the Media view that FTM offers would make a large task like changing all the image locations much easier to manage.

Truth be told, however, this task shouldn’t be needed at all by FTM – if it understood file paths properly this wouldn’t be an issue.

Taking a look at the GEDCOM file itself I can see that Reunion does something very nice – it exports the image cropping information. Frequently when using an image for a specific person you crop the image so it only shows that person. This is particularly true for the ‘primary’ image that one uses to represent the person in the tree. One can also use one group photo to crop out individual face shots of many different people. Showing the full image in a small window where you only want the head would be fairly useless. It’s not clear to me if the _CROP tag that Reunion uses is part of the GEDCOM standard or some kind of generally agreed-upon way to share that information, but it seems to me that FTM ignores the information. Worse, and the likely reason, I can’t figure out any way to crop photos in FTM at all.

I have a lot of complaints about Reunion’s handling of media. I think it should offer to keep a library of thumbnails or even web-resolution images itself, so that it doesn’t need to spend so much time doing image conversion when doing things like creating a web site based on your tree. I think it needs a central media view where you can manage all the images in your tree and make sure all the files can be located, etc. I think some integration with iPhoto would be nice. I think being able to tag photos with information on the people in them and the location information would be incredibly useful. Even with all of these complaints, FTM seems surprisingly inadequate when compared to Reunion in this area.

Launching Family Tree Maker for Mac and Importing a GEDCOM

I pre-ordered FTM for Mac when it was initially announced, and received it just recently. It comes on a single CD with a simple installer program on it. Launching the installer and running it, installs almost 500mb of stuff on your computer. Not exactly light-weight, but disk space is cheap these days, so that doesn’t bother me very much.

After installing it, I run the newly installed program and find it is a bit clunky when launching. It tells me that the program includes a free 2 week trial of Ancestry.com, and asks if I want to sign up, or if I already have an account to enter my login information. As I have an Ancestry.com account already, I enter my login details and continue. Things are a bit slow here, as I think it’s trying to communicate with Ancestry.com. I’m not sure how I feel about this connection from a privacy point of view. I certainly don’t like that it slows down the program.

The good side of the connection to Ancestry.com is that it allows the program to access Ancestry.com and try to find records connected to the people in your tree. This is a very nice feature, especially since it doesn’t require you to upload your whole tree to Ancestry.com where others can see it. If you want to publish your tree to Ancestry.com that is possible, but from what I understand syncing data between the online tree and the tree on your computer is not supported currently in the Mac version of FTM – but it is supported on the Windows version. A bit annoying.

The downside of Ancestry.com integration is the real question of how they protect your privacy. When you’re a member of their web site, you have ultimate control over what Ancestry has access to because they only know what you put on their site. Having access to the whole tree is a whole different issue, and not one I’m sure they’ve addressed. There’s no way to know what information is being sent back and forth. There’s also the issue that you need to have a paid account with Ancestry.com to use this feature, obviously. As I have an account already, this doesn’t affect me, but I wonder what features I will be missing if I decide to cancel my subscription to Ancestry.com?

So I exported a new GEDCOM from Reunion and told FTM to import it. The process was fairly quick, but it came up with over a hundred errors. I told it to load the error log, and something a bit bizarre happened – it launched the log in Notepad for Windows. Now you may be asking yourself how that is possible since I’m on a Mac – the answer being that I have a copy of Windows that runs in Parallels, an emulator. Even though Windows wasn’t running at that time, Parallels is ‘smart’ enough to know that a Windows filetype was launched and will try to launch it in Windows. Now, whether this is misconfiguration on my part with Parallels, or whether FTM actually created a file that is a Windows Notepad file, I’m not 100% sure, but I can say that this feature of Parallels has never before launched windows when the file wasn’t actually a Windows file, so I’m a bit confused. I think it would be nice of FTM to ask which text editor to use when launching text files (something Reunion does) to prevent this kind of mistake.

So what were the errors? They fall into two categories: Non-strict dates and non-standard GEDCOM tags. So first, it seems FTM is being strict about date formatting on import, which is not a bad thing, but annoying in that they don’t give you a way to fix these mistakes as you import. Reunion is actually very good about keeping date formatting strict, and converts all dates you enter into a standard format, but the dates that FTM rejected seem to be dates that I imported from relatives in other GEDCOM files. They include things like:

1939?
END MAY 1936
1932 OR 1935

These are obviously problematic for a strict date system, but I think FTM should have asked me to correct them. Perhaps Reunion did the same thing when I originally imported the GEDCOM they came from, I don’t remember, but there were not so many dates like that and it would be nice to fix them from the beginning. I’ll leave it to the BetterGEDCOM group to come up with a way to support fuzzy dates in a standard fashion.

The second problem was unrecognized tags. Reunion lets you create custom fields and assign GEDCOM tags to them for export. FTM doesn’t know what to do with these custom tags and does something very bad in my mind – it ignores them. Reunion actually added two of the fields that were ignored, a web site (tag URL) and an e-mail address (tag EMAL) that at some point was added to the profile of the exporter. It’s perfectly normal to add an address and contact information to the information in the GEDCOM file about the person who created it, but I guess e-mail and web sites were not common enough when the GEDCOM standard was last updated for these to be standard tags, and thus FTM ignores them.

The other custom tag, which makes up the bulk of the errors recorded by FTM on import, was the NAMR tag. I may may have made that one up myself, but frankly I don’t remember as it was such a long time ago. The tag is for the custom field I created for Religious Name. In Jewish parlance, the Shem Kodesh or Hebrew Name. For those people whose Hebrew Name I know, I add that to the custom field. Reunion exports it like any other fact about the person, which frankly is what it should do. Maybe FTM doesn’t support custom fields at all, I don’t know yet. If FTM does support custom fields and doesn’t offer a way to create such a field on import, that would be pretty dumb. As you might imagine, going through the error log and figuring out which people had a NAMR tag (the log only shows the line # of the error in the GEDCOM file) and then adding this fact to each record in FTM would be a mind-numbing experience that I would hope is not necessary. As my knowledge of FTM at this point is fairly minimal, I’ll hold judgment on this, but it doesn’t look particularly good.