In my last blog entry, I talked about the pros and cons of various genealogy software. I even had a developer of one of the programs comment on the post, saying he will take my comments under advisement (+1 to HuMo-gen for good customer service!) For this blog entry, we dive into the GEDCOM specification. A GEDCOM file is a text file. In it, you declare various records with a few different types. Two important record types are INDI and FAM records. An INDI record designates an "individual" while a FAM record designates a "family". A family record represents an immediate family: two parents and their children. When you add an individual or a family into your tree, each one is given a number. The first individual is I1. The second is I2. The first family is F1 while the second family is F2. INDI records contain the family id of the family they are a member of. FAM records list the individual ids of the mother, father and children of that family. This creates a doubly-linked-tree of individuals and families.
From a format point of view, having the doubly-linked structure can be helpful. Especially since some genealogy software actually uses the GEDCOM file as its database. That software doesn't have to traverse entire trees to navigate up and down a family tree. This does pose a question when it comes to merging GEDCOM files. This type of problem happens with specifications. Lots of specifications are open to interpretation, which leaves different software implementing the same operation in two different ways. This can be illustrated with an example family.
Imagine in tree 1, you have I20 married to I21 in F2. Now, in tree 2, you have I30 married to I31 in F2. When you merge tree 2 into tree 1, all the INDI id's get reassigned so that there are no id conflicts. The problem is the FAM id's are not always reassigned. After the merge, something interesting happened to I20 and I21. You will notice that there are more children! All of I30 and I31's children are listed under I20 an I21. This is because all of I30 and I31's children have a pointer to F2 in them. In tree 2, F2 is the marriage between I20 and I21.
Webtrees and Family Tree Maker both exhibit this behaviour. With Webtrees, you can only merge one record at a time, so I only had one record to fix. In Family Tree Maker, you can append an entire GEDCOM file and corrupt many of your records! Luckily my father-in-law made a backup before trying the merge. HuMo-gen doesn't suffer from this problem. With HuMo-gen, you can append a GEDCOM file into your database. When looking in the Webtrees forum, Webtrees doesn't allow appending a GEDCOM file into an existing family tree specifically because of this problem. Their opinion is that merging GEDCOM files is a bad idea (which isn't far from the truth). HuMo-gen has a duplicate search that you can run after the append that will let you merge entries together. The duplicate search doesn't find every duplicate (which would be impossible to do), but HuMo-gen also provides an interface for manually merging two entries. This is why HuMo-gen is currently my preferred software.
In my next blog post, I will dive into the problem of external media and the GEDCOM specification.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.