Tag Archives: familysearch

Genealogy standards, another look

Over a year ago I took a look at genealogy data standards and where they were headed in my article The Future of Sharing (Genealogical Data). In some ways a lot has changed since I wrote the article, but in some ways we’re really at the same point we were then, with no clear picture of the future. This past week’s 2nd annual Rootstech conference (my last article mentioned the then-upcoming 1st Rootstech) has brought some of the questions asked into focus, so I thought it was worth reviewing what has happened.

GEDCOM X

On the face of it, the biggest news to come out of the conference was the release of long-awaited successor to GEDCOM, GEDCOM X. FamilySearch, the online presence of the LDS church which was the creator and maintainer of the original GEDCOM standard, released this new standard at the conference a few days ago. FamilySearch hits a lot of the right keywords in the release – the format can be XML or JSON based, is released under an Creative Commons license, supports metadata including Dublin Core and FOAF, the development is hosted on Github, it offers both a file format (like tradition GEDCOM) and an API, and more. Yet there are also some strange decisions that seem to have been made, and no explanation seems to be given. One that stands out is the decision to base the file format MIME, a format created for sending e-mail attachements (MIME is an acronym of Multipurpose Internet Mail Extensions). So far the logic behind many of the decisions that have been made seem very opaque. The entire development of GEDCOM X seems to have been done up to this point without any input from the industry at large, or even the well know efforts to improve GEDCOM, such as the Better GEDCOM group. Indeed, the answer in their FAQ about these efforts seems largely patronizing:

Have you heard about FHISO (BetterGEDCOM), OpenGen, ?
Of course. We’ve heard about them and many others who are making efforts to standardize genealogical technologies. We applaud the work of everybody willing to contribute to the standardization effort, and we hope they will continue to contribute their voices.

In other words, at least to my ears, it’s saying they know other people want to improve GEDCOM, but they are going to do their own thing and maybe they’ll listen occassionally (but no promises). In short, while it’s great that FamilySearch has come out with a new standard, their approach to doing so does not seem geared towards gaining widespread adoption from the industry at large, or at least not in such a friendly manner.

Of course, the huge advantage FamilySearch has over just about anyone else is the very large developer network they’ve cultivated for accessing familysearch.org. They are essentially a non-profit organization which has many commercial companies using their current API. To the extent that they transition these existing companies from their legacy API to GEDCOM X, they will certainly have a major advantage over other efforts to replace GEDCOM.

Progress On Other Fronts

So what happened to the other efforts mentioned in my last article?

The most visible effort has been the BetterGEDCOM wiki, which is moving from an informal group to a formal organization called the Family History Information Standards Organisation (FHISO) which will now sponsor the wiki. While they have been the most active effort to create a replacement for GEDCOM, they seem to have been overtaken by the too-many-cooks problem and how they plan on coming to a consensus remains to be seen, let alone how they convince industry organizations and companies to agree with them. It will be interesting to see FHISO’s response to GEDCOM X, and if they will focus their efforts on trying to implement their ideas within the GEDCOM X framework, or if they will continue to try to go it alone.

The OpenGen International Alliance, started by the people at AppleTree.com, doesn’t seem to have taken off. Either for the matter has AppleTree, which may explain the why the OpenGen site hasn’t been updated in the past year (and refers to an upcoming webinar last March).

APIs

One of the most interesting developments last year was the introduction of Application Programming Interfaces (APIs) for genealogy web sites. Indeed, the rumors around what would become GEDCOM X was that it was only an API, and not a file format, but luckily that turned out not to be true and it is both. The only APIs that had been released before my last article were Geni.com‘s API and OneWorldTree.com‘s GenealogyCloud API.

Geni seems to at least gotten some traction with their API, with future support for syncing data coming from AncestorSync. Presumably this uses Geni’s API. I haven’t heard of other uses of the Geni API, however. If you know of other developers using the Geni API, let me know in the comment.

I have not heard of anyone using the GenealogyCloud API. If you know any anyone using GenealogyCloud, let me know in the comments.

As I predicted in the last article, MyHeritage introduced their own API, smartly named Family Graph. I say smartly because it is clearly mimicking Facebooks’ Social Graph API. They’re not comparing themselves to Geni, but to Facebook, which is smart. The other very smart thing they did was introduce a contest to develop applications that use the Family Graph API. If no one uses your API, what’s the point right? The winner receives $10,000. The deadline for that contest is actually in about a week from now, with judging by a panel taking place in the first half of March and the results announced on March 15th. The real test will be the quality of the applications submitted, and whether the applications were submitted by individual developers or by larger companies. If the contest results are published next month with no major applications, then this will in my estimation be a setback for MyHeritage, not an achievement.

Conclusion

It will be very interesting to see how the introduction of GEDCOM X is accepted by the genealogy companies at large that are needed to make a new format successful. FamilySearch has some key advantages in that they are a non-profit organization (even though in many ways they compete with the large commercial companies like Ancestry.com and MyHeritage.com) and that they already have a large developer network. While many of the largest genealogy companies are not currently part of that developer network, if all of the ones who are start adopting GEDCOM X as their export format of choice, I think it will be hard for other companies to not adopt it. GEDCOM X’s dual format/API functionality also gives it a major edge, especially if FamilySearch’s legacy API is replaced by the API functionality in GEDCOM X.

Some have predicted there would never be a true replacement for GEDCOM, and others have said that technology such as AncestorSync’s upcoming products would make the need for a file format unnecessary. I think both of these assertions are incorrect. There will be a replacement for GEDCOM, and it is necessary. Whether or not GEDCOM X is the ideal replacement seems to me to be a moot point. They will get the traction they need to push GEDCOM X into the mainstream. The real question is will they truly make it an open standard, or will they continue to hold it close to the chest? The real test will be when other groups insist on various features, and how they handle those demands. FamilySearch has put in all the trappings of an open and transparent development process, so let’s hope they keep in that direction.

Changes in Access to the SSDI and Vital Records

I’ve been meaning to write this post for the past few weeks, and am sorry I did not do so earlier. There have been a number of changes in access to data of interest to genealogists in the United States going on, and in some cases this can seriously effect the ability of people to do research.

One major source of information for genealogists has been the Social Security Death Master File, usually referred to online as the Social Security Death Index (SSDI). The Death Master File is considered by law to be a public document, and lists all people who applied for a social security number (with an SS-5 form) and subsequently had their deaths reported to the Social Security Administration. Information on the SS-5 form can frequently be very useful to family researchers, as it usually lists the names of the parents of the applicant.

SSA increases delay in receiving names of parents

Last month, the Social Security Administration, without any announcement, extended the amount of time one must wait to get the names of parents on a social security application from 70 to 100 years from the applicant’s birth. In other words, if last month you could order an SS-5 form of someone born in 1941 and find out their parent’s names, now you will not be able to order that record until 2041. Put another way, you can only order records today for people born before 1911. In fact, the reality is worse, you can order the SS-5 and they will charge you for it, but they will just white-out the parent’s names which is probably the only good reason to order an SS-5 anyways.

Reduction in State records in the DMF

Another change also took effect last month, when it was announced that some state death records would no longer be incorporated into the Death Master File, and over 4 million existing records would be expunged from the existing file. The reason for this is a claim that state records have different privacy rules, and thus cannot be incorporated into the public Death Master File. This also means nearly a million records a year will no longer be added to the Death Master File going forward (over over 30% of records that would have been added). Why this wasn’t recognized for the past decades this file has been available is not mentioned. Additionally, it seems the Social Security Administration has also dropped last residence zip codes from the information they add to the Death Master File. When dealing with people with common names in large cities, zip codes are very useful in figuring out which record it the correct record.

Massachusetts tries to go against hundreds of years of open access rules

In my home state of Massachusetts, a bill (H.603) was introduced earlier this year in the state legislature to restrict access to birth records in the state. Massachusetts has always been an open access state when it comes to public records, so this would actually be the first time that access to vital records have been restricted in Massachusetts. Open access to vital records can be seen as an easy way for identity thieves to steal information, or as an easy way to prove the legitimacy of identities. This reckless attempt to restrict access to these records is not just a setback for genealogists, but will restrict access to those people looking to build a family medical history (needed for some inherited diseases) and also restrict the ability of military personnel to track down next-of-kin of soldiers, something the genealogical community has helped the military with for many years. It’s also a bit of political hackery, as it doesn’t actually address the issue of identity theft.

Good politics?

It’s not clear to me why this has become a political issue for some, but I guess seeming to protect people’s privacy (while not actually doing anything about it) is good politics. Politicians love to scare people and tell them that their identities will be stolen if the government doesn’t crack down on identity theft. Except, they don’t actually crack down on identity theft, such as addressing how its possible for someone to file for taxes with the social security number of a deceased person. You’d think the IRS would have access to the Death Master File, and could automatically check social security numbers against filings, but that would be too simple a solution (and would actually put the onus of checking for fraud with a government agency).

The KIDS Act of 2011

In steps Representative Samuel Johnson (R-TX) and his Keeping IDs Safe Act of 2011. This bill, also knows as the KIDS Act, would make it illegal for the government to release the Death Master File at all. Does it address fraud at all? No. Does it prevent government employees from sharing information with identity thieves? No. How about legislating 10 year jail sentences for government employees who release personal information to anyone unauthorized to view it? Regardless, this bill and some of the press coverage of identify thefts that led up to it, has scared various genealogy companies into cutting back on access to the SSDI.

My sister’s story

It’s worth noting a story from when I was a child in Boston. As I recall, my teenage sister had gone to get her driver’s license and it was supposed to be mailed to her. Except it never arrived. Eventually she contacted the RMV and they sent her her license. What happened to the original one? Nobody knew. Well, someone knew. One day we get a call from a branch of our bank the next town over. This was when people still went to the bank to, you know, do bank stuff. A woman had arrived each day over the past several days and deposited checks into my sisters account adding up to a lot of money. Before those checks could clear, she arrived again at the teller she had been depositing those checks with, and asked to make a withdrawal. She had a driver’s license with her picture on it, but my sister’s name. The teller didn’t know my sister, but she thought the woman looks a bit older than my sister’s age as listed on the license. The teller asked the woman to wait a moment, and brought the license to the branch manager. The manager had previously worked in the branch my family went to, and actually knew my family, and knew this was not my sister. It was an interesting scam, of course. Depositing checks with the teller so the teller would associate her with depositing money into the account, then using a fake license to withdraw money from the account. If the branch manager hadn’t previously worked at the branch in our town back in those days when branch managers knew their customers, the woman might have gotten away with it. In the end, I don’t remember if that woman was arrested, or got away. I do remember being told they had tracked the scam back to the RMV where multiple licenses had been forged with incorrect photos. I don’t know how much the RMV worker was paid to forge my sister’s license, nor what the thought process was that led them to risk doing that, but presumably if there had been harsh laws against this, they would not have done it.

I’ll guess most of the people reading this haven’t seen the movie this comes from, but this had to be done:

That must have been when Samuel Johnson was still trying to get into the college parties…

For those who are lost, I’ll share this clip from the movie Superbad:

That isn’t high art, and that clip is highly edited from the original (this is a family blog after all), but I felt it necessary to insert a little comic relief here. Back to the issue at hand…

The easiest site to search SSDI online has long been Rootsweb, which is a genealogy community site that has been hosted and run by Ancestry.com for more than 10 years. The Rootsweb SSDI page just days ago changed from a site that allowed full searching of the SSDI, to the following message:

Due to sensitivities around the information in this database, the Social Security Death Index collection is not available on our free Rootsweb service but is accessible to search on Ancestry.com. Visit the Social Security Death Index page to be directly connected to this collection

If you follow the link to Ancestry.com’s own SSDI search page, you can search and get results, but unless you are a member of Ancestry.com, you only get partial information. Even if you have an Ancestry.com subscription, they’ve further cut back on the information available in their SSDI database, as they describe:

Why can’t I see the Social Security Number? If the Social Security Number is not visible on the record index it is because Ancestry.com does not provide this number in the Social Security Death Index for any person that has passed away within the past 10 years.”

This is a bit of pre-emptive work it seems, to keep the politicians off their backs.

Ancestry.com and GenealogyBank cut back on SSDI access

Ancestry.com is not the only company to cut back on access to the SSDI. GenealogyBank has eliminated the social security numbers from its database altogether. Genealogybank offers free searching of their SSDI database, but you must register for the site in order to see the results. Even if you’re a subscriber, there are no social security numbers listed in their database at all now. GenealogyBank says they removed all social security numbers after people called them and explained they were erroneously in the SSDI and everyone could access their social security numbers through the GenealogyBank database. One article I read online estimated that out of the 2.8 million new entries added each year, some 14,000 entries are added for people who are still living. That seems a clear statistical estimate (half of one percent), and I have no idea how they came up with that number, nor how many of those false entries get removed from the database in subsequent revisions. I’m not saying people are not horribly effected by these mistakes in the SSDI, but maybe the solution is to fix the processes that introduce those mistakes? Any even if there are 14,000 mistakes a year, no one has shown that this has led to a single stolen identity as far as I can tell.

FamilySearch.org still offering SSDI access…for now…

FamilySearch.org still offers free searching of their SSDI database, without registration, and still shows the social security numbers of everyone in their database. I don’t know how long that will last, however. Personally, I recommend everyone search the FamilySearch.org database and mark down the information they have on each person in your tree. This isn’t only the social security number, but the birth date, death date, place of issuance (of the social security number), last residence, and place where last benefit was sent. All of this information can be useful in genealogy research, and while these companies are removing the social security numbers now as a pre-emptive attempt to prevent further regulation, if regulation does arrive from the legislature, as written now it would eliminate access to all of this information (not just the social security numbers). Therefore, I suggest making a list of those people in your database who were working in the US after 1935, and going through the FamilySearch.org SSDI Database and copy all the information you can, while you still can…

Also, for a comparison of the Ancestry.com and Familysearch.org SSDI databases (written before the changes), see this article from Ancestry Insider called SSDI: Ancestry.com vs. FamilySearch.org. If you have a subscription to Ancestry.com, it might be worth it to take a look at their database as well, to see if they list the ZIP code for earlier entries in the database.

The 1940 US Census

It’s rare that massive new sources of genealogical information are released, and certainly rare that such sources are released for free. Every ten years in the United States, however, the census from 72 years earlier is released. In the past it has taken a lot of time to get the census made available to the public, primarily because of the massive cost in digitizing and indexing information on tens of millions of people.

On April 2, 2012, the 1940 US Census will be released to the public. Besides the obvious benefit of having information on the over 130 Million residents of the United States in 1940, there are other reasons to be excited about this release.

For one, it is the first time that the National Archives is releasing the census in digital form. In the past, companies needed to scan millions of pages of microfilm to create their own digital images of the census records. On April 2, 2012, the National Archives is releasing the entire 1940 census in digital form. There will not be an index to those records, which brings us to the second reason this release is exciting: Many genealogy companies and organizations have been planning for this release for years and it will be indexed in record time.

For starters, Stephen Morse on his great One Step website, has created with Joel Weintraub and the help of volunteers, ways of finding the 1940 Enumeration District (ED) of any address in the United States. They even have a quiz that helps you determine what the proper way to figure out the ED for where your family lived in 1940. When the census records are released, searching by ED will be the only way to find records in the census. If you know where your family listed in April 1940 (when the census was taken), then you can find the records for that address using Steve Morse’s tools. FOr a very detailed look at how the process will work, see Stephen’s article Getting Ready for the 1940 Census: Searching without a Name Index which appeared in the Association of Professional Genealogists Quarterly this month.

Next, Ancestry.com has announced that they will be making the images and their index to those records (which they will develop on their own) free through at least the end of 2013. It’s unknown how long it will take Ancestry.com to index the records, but presumably their index would be available before the end of 2013.

Archives.com, which has been seeking in recent years to compete with Ancestry.com as a lower-cost service, announced that they have partnered with the National Archives to be the official host of the images that will be released on April 2, 2012. The official site the images will be released on has not yet been announced, but Archives.com has posted information on this partnership at archives.com/1940census.

More recently, it has been announced that three different genealogy companies have joined forces to index the 1940 US Census together and thus make the 1940 census searchable for free as well. These are Archives.com, FamilySearch.org and FindMyPast.com. They will be using FamilySearch.org’s indexing tool (which I discussed almost exactly a year ago here) to coordinate the indexing project.

One interesting point is that it makes sense that Archives.com is involved since they are hosting the images for the National Archives (and have no public indexing tool of their own), and it makes sense that FamilySearch.org is involved (since they have the indexing tool and have previously proven themselves by indexing the 1930 US Census), but the odd man out seems to be FindMyPast.com. What’s interesting is that FindMyPast.com just re-directs to FindMyPast.co.uk, as it is actually a British genealogy site. Is FindMyPast planning to move into the US genealogy market and is the 1940 census their means of doing so? or are they just planning on offering the 1940 census index to their British users as a means of tracking relatives that moved to the US? The use of FindMyPast.com in the press release instead of FindMyPast.co.uk makes this an interesting question.

Together, the three companies have set up the 1940 Census Community Project. You can check out the information on the project now, and if you’re interested in helping index the 1940 US Census, you can download FamilySearch.org’s indexing tool now and try it out with other projects FamilySearch.org is indexing.

In addition, one of the interesting pages the project has released is what the enumerator was supposed to ask each family when adding them to the census. This gives you a good idea of what to expect when the 1940 US Census is released.

So there you go, we’re 105 days away from the release of the 1940 US Census images. Now you know how you’ll be able to find your family (if they were living in the US on April 1, 1940) when the census is released.