Good news! Google’s Bard AI has improved OCR and Translation for Korean genealogy records!

Example of interpreting an image from a JokBo using bard.google.com :

It did a good job of extracting the HanJa text… not so good on the HanGul text in this example, but the hardest part is the HanJa and it did quite well at extrating the HanJa text, converting it to HanGul text, and translating it to English text. It looks like Bard AI can help automate some of the most tedious and time consuming parts of interpreting Korean genealogy.

I tried a few different version of the prompt, including “This image constains a mix of Korean HanJa and HanGul characters. Extract the text using OCR and preserver the original HanJa or HanGul characters in the text. The image is written top-down and right-to-left. The first column contains 3 HanJa characters followed by two HanGul characters. The other columns contain HanJa characters. Output the text in four ways: HanJa, HanGul, Romanized, and translated to English.” in a failed attempt to help it recognize the HanGul version of the name in the lower right. It returned the following useful results (with an error interpreting the HanGul version of the name — it thought it was a single HanJa character rather than two HanGul syllables):

The text in the image is as follows, from top to bottom and right to left:

HanJa:

  • 子亨達 啓
  • 壬戍一九二二年十月
  • 十五日生辛酉一九八
  • 一生五月一五日卒墓
  • 孝令面孝誠公園 ● 配
  • 密陽朴明花父錫恒壬
  • 申一九三二年十月二
  • 十二日生一男二女

HanGul:

  • 자형달 계
  • 임술 일구이이년 십월
  • 십오일 생 신유 일구팔
  • 일생 오월 이십오일 졸묘
  • 효령면 효성공원 ● 배
  • 밀양 박명화 부 석항 임
  • 신 일구삼이년 십월 이
  • 십이일 생 일남 이녀

Romanized:

  • Jahyeongdal Gye
  • Imnyeol ilguiiyeon sipwol
  • sipoeil saeng sinyu ilgubal
  • ilsaeng owol isiphoseoil jolmyo
  • Hyeongnyeongmyeon Hyoeseonggongwon ● Bae
  • Milyang Park Myeonghwa bu Seokhang Im
  • sinsin ilgusam-iyeon sipwol i
  • sipieil saeng ilnam inyeo

English translation:

  • Jahyeongdal Gye (子亨達 啓)
  • Born on October 15, 1922 (壬戍一九二二年十月 十五日生)
  • Died on May 15, 1988 (辛酉一九八 一生五月一五日卒墓)
  • Buried at Hyoeseong Park, Hyeongnyeongmyeon (孝令面孝誠公園)
  • Spouse: Park Myeonghwa from Milyang (配 密陽朴明花)
  • Father: Im Seokhang, born in 1932 (父錫恒壬申一九三二年十月二 十二日生)
  • One son and two daughters (一男二女)

Note that the English translation is approximate, as some of the HanJa characters can have multiple meanings.”

Using Artificial Intelligence (AI) to Translate Korean Genealogy and Given Names

How can AI be used for Korean Genealogy? How accurate and useful is it? Actually, I’m working on training some models that should be quite useful… but in the meantime here are some examples of what you can do with popular AI engines:

Translating an Image from a Notional JokBo (Korean Genealogical Record) using Google Translate:

Converting HanJa records to HanGul can be a really helpful step in interpreting Korean Genealogy. The example above didn’t result in a perfect translation to HanGul, but it did a really good job and would save a lot of time.

Translating Korean Names:

I received a request to translate a Korean name. Here is how Google’s Bard and OpenAI’s ChatGPT 3.5 handled it.

I tried using AI to accomplish the following instruction (prompt): “Write the Korean name SungJoon in HanGul and list the HanJa possiblilties for each syllable and the meaning in English of those HanJa characters.”

Google’s Bard AI responded with this:


Sure. Here are the Hanja possibilities for each syllable of the Korean name SungJoon, along with their English meanings:

성 (seong)

盛 (seong): to flourish, to prosper
升 (seung): to rise, to ascend
成 (seong): to become, to achieve
聖 (seong): holy, sacred
星 (seong): star
聲 (seong): sound, voice
醒 (seong): to wake up, to become aware
昇 (seung): to rise, to ascend
乘 (seung): to ride, to mount
森 (seon): forest, woods
준 (jun)

俊 (jun): handsome, elegant
準 (jun): accurate, precise
遵 (jun): to follow, to obey
君 (jun): king, lord
隼 (jun): falcon
峻 (jun): steep, rugged
琿 (jun): jade
玧 (jun): bright, brilliant
玢 (jun): precious stone
玔 (jun): to polish, to refine
The meaning of the name SungJoon will vary depending on the Hanja used. For example, 성준 with the Hanja 盛駿 means “to flourish and advance”.

I also tried ChatGPT 3.5, and it gave this response:


The Korean name “SungJoon” can be written in Hangul as “성준.” Each syllable represents a different sound:

“성” (Seong): 성 (Sung) can have several Hanja possibilities, including 成 (growth, success), 城 (castle, city), or 聲 (sound, voice).
“준” (Joon): 준 (Joon) can also have various Hanja options, such as 俊 (talented, handsome), 峻 (steep), or 准 (standard, accurate).
Here are some common Hanja characters and their meanings that can be associated with each syllable:

“성” (Sung):
成: growth, success
城: castle, city
聲: sound, voice
“준” (Joon):
俊: talented, handsome
峻: steep
准: standard, accurate
It’s important to note that the meanings provided above are general interpretations of the Hanja characters, and the actual meaning of a name can vary depending on personal and cultural preferences.

Not bad… they both seemed to do well.

Writing English names in Korean:

OpenAI’s ChatGPT 3.5 seemed to do a very good job of writing English names in Korean HanGul characters. Google’s Bard AI struggled with some names but did well with others.

Other Ideas?

Fascinating progress…

Do you have other ideas for how to apply AI to Genealogy? What kind of results are you getting?

Korean Genealogy featured in PBS Finding Your Roots

PBS Finding Your Roots published an episode that features David Chang and the exploration of his Korean genealogy.

https://www.pbs.org/weta/finding-your-roots/watch/episodes/children-of-exile

David Chang is the chef and founder of Momofuku. Since opening his first restaurant, Momofuku Noodle Bar, in 2004, he has received six James Beard Awards, and has been recognized as GQ’s Man of the Year and a Time 100 honoree. In 2018, David formed Majordomo Media. He is the host of The Dave Chang Show podcast and two Netflix original documentary series, Breakfast, Lunch, and Dinner and Ugly Delicious. His cookbook, Momofuku, and memoir, Eat A Peach, are both New York Times bestsellers.

A new show, The Next Thing You Eat, comes out on Hulu and a new cookbook called, Cooking At Home: Or, How I Learned to Stop Worrying About Recipes (And Love My Microwave) will both be released this Fall.”

Finding Clan Names

FINDING CLAN NAMES:

JokBo often list the clan names (clan origin + surname) for spouses included in the record. This is great, because you can use that to research the spouse’s genealogy. Sometimes it can be a time-consuming task to enter and search for the HanJa versions of the clan names. Hopefully this guide can simplify and speed up that process.


It would be a lot of work to draw or search for each complex HanJa character in a clan name.  An easier way is to search for the clan name on my website at http://koreangenealogy.org/book/online-addendum/surnames-clans/ .  

1) First find the HanJa for Mr/Mrs./Family/Clan 씨 which is 氏.  That will let you know that the preceding 3 (occasionally more) characters are the clan origin and surname. 

2) The surname should be the character (occasionally 2 characters) before 氏.  Find the surname here in my list of Korean surnames: http://koreangenealogy.org/book/names/family-names/ .  Copy the HanJa version of it.

3) Search for the surname in my list of clans here: http://koreangenealogy.org/book/online-addendum/surnames-clans/ .  When you find it, search for the full clan name (origin + surname + 氏).  When you find it, copy and paste the HanGul and HanJa versions of it into your records.

4) Search for the clan name online.  

4a) Go to the Korean collection of JokBo records at familysearch and find the surname and clan origin to see a list of JokBo records.   It will organize the origins by Korean province.  Just try all the provinces if you don’t know where the origin town is.  https://www.familysearch.org/search/collection/1398522 .  The records aren’t indexed, but you can search through scanned pages to find your ancestor.

4b) Search for the clan name to find an official family website or websites about the family.  Some family websites include an online and up-to-date version of their “Internet JokBo”, which is a great time saver because it is indexed so that you can search for a specific ancestor within the JokBo record.  Knowing the branch within the clan and/or the person’s parent’s name will help narrow the search.

4c) Search for the clan name and the word JokBo (족보) to narrow down your results.  Google works well, and sometimes using a Korean search engine can find additional results.

Finding HanJa for Given Names

Here is an example of a given name from a JokBo (family registry). In this case, both the HanJa and HanGul are provided by the JokBo record for this individual:

image.png

The HanGul version can be entered with a Korean keyboard or virtual keyboard (clicking buttons on a picture of a Korean keyboard). The HanJa version is a bit harder.  There are several ways to approach finding the HanJa for a given name. You can use one of the following methods for each syllable / HanJa character:

1) Look up 상 on the “Given Names” section of this website at http://koreangenealogy.org/book/names/given-names/ .  You can copy “상” and then Find (search) it on the webpage.  It will find this on the page:

image.png

There are 32 HanJa characters for Sang that are used in Korean given names.  You can just find the HanJa character that matches the one in the JokBo and copy it from the webpage to your records.  In this case the HanJa for 상 is 相.  The advantage of using this method is that the HanJa listed on this website page are limited to HanJa used in Korean Given Names according to the Korean Census, so you will get a lot fewer options and find it quicker than if you used an online Korean-HanJa dictionary.

2) Another method is to use an online Korean-HanJa dictionary.  My favorite is http://hanja.dict.naver.com .  You get hundreds of possible HanJa for 상, but 相 will be in there somewhere…  in this case we’re lucky and it is the 2nd one because it is the 2nd most common one.

image.png

HanHa version of 상 is in large blue font: 

image.png

HanJa dictionaries generally show a word that has to do with the meaning (in this case 서로) then the HanGul pronunciation.  In this case there are 3 possible HanGul versions: 상, 빌, or 양.  Fortunately, the JokBo told us to choose 상 for this one.  If the HanGul was not included in the JokBo and you only knew the HanJa, then you would have to select one of those three HanGul versions.  The first one is the most common.
After that it includes a row of information about this HanJa / Chinese character.  In this case, it is a compound or complex character — it is one HanJa, but it is made up of multiple other simpler HanJa characters and combined into one.  In this case 相 is a  combination of 木 and 目. The “base” 부수” or main sub-character it is 目.  It also informs you that to draw the entire character 相 requires 9 strokes.  HanJa and Chinese dictionaries generally use the # of strokes as one of the elements of “alphabetical” sort order.  If you click on 획순보기 it will show you the proper stroke order and direction to draw the character.   It is important to learn the proper stroke order and direction if you want to draw characters as a way of searching for them.

image.png

3) Searching by 부수, or part of a complex chinese character, is also an option that could save time in some cases — especially when the complex character is difficult to draw.  If you search for 目 and scroll down to the 부수·모양자 section it shows 4,656 HanJa characters that contain 目 as a part of them.
4) Another option is drawing the character.  If you click on the pencil symbol image.pngthen you can draw the HanJa character and it will search for it as you draw.  It works best if you use the proper stroker order and directions.  The basic rules are here: http://koreangenealogy.org/book/korean-writing/stroke-order-and-direction/In this case, identified 相 in the upper left box, so click on that to see a dictionary entry, and copy the HanJa from that.

image.png

You could also have just drawn 目 and searched for HanJa that contain it, as noted above in the 3rd option.

Adding Korean Names to FamilySearch.org

The proper and most complete way to add a Korean name to family search:

  • Names: When adding a new name, switch the language to Korean to get fields for all 3 versions of the name (HanGul, HanJa, and Romanized). You can enter (or paste) the HanGul and HanJa versions. When you enter the HanGul version, FamilySearch.org will automatically populate the Roman version (using the English alphabet). If you prefer a different Roman version, you can override it by just typing in the preferred version. If the person lived in the U.S. or Canada, it is best to use the Roman version on their U.S. or Canadian records. There are many romanization methods, so there are a lot of options. If the person just lived in Korea, it is generally best to use the default Romanized version that is auto-populated to maximize the chances of successful cooperation with other researchers who are relying on the Romanized version of the names. If there are additional names, like a pen-name or a posthumous name, you can add that after the individual is created in the “Other Information” section as an “Additional Name”.
  • Clans: Including the clan origin will help you identify the correct JokBo records for further research on an ancestor’s line. Generally, the clan origin is in the title of the JokBo (family registry). Including the branch or even sub-branch within the clan can further help narrow down a JokBo volume or section for. Add a clan under the “Other Information” of a Korean record. Add a Branch (파) under the “Custom Fact” section and title it “branch”.
  • Titles: Titles of nobility or titles of government positions can also be added in the “Other Information” section.

Indexed Korean Historical Genealogy Records

There are a wealth of scanned records available nowadays for Korean genealogical research.  However, finding indexed and searchable records is a bit more difficult.  Fortunately, there have been some recent advancements.

An amazing project on GitHub includes searchable text file versions of some historical Korean records.  The project is called “JoSeon MunKwa”:  JoseonMunkwa .  The project page includes links to the original sources and has downloadable text files in the data directory.  The project code attempts to take on difficult tasks that challenge people researching Korean genealogy, such as addressing multiple people with the same name and people with multiple names.  The historical record scope includes:

– Munkwa Bangmok (문과 방목) (civil service roster data, including 14638 people from 1392 to 1897).  AKS did some of the original reasearch on this file.

– Annals of the Joseon Dynasty (조선왕조실록) (A.K.A. Veritable Records of the Joseon Dynasty) (183 books covering 472 years of Joseon history kept by the rulers).  This history website explores the indexed records in detail: The Veritable Records of the Joseon Dynasty

– Joseon Family Relations (ManSeongDaeDongBo 만성대동보) (A list of prominent people and their clan lineage including founding clan members).  Online searchable record: Josean Lineage Network Information System (LNIS) based on the ManSeongDaeDongBo (The Comprehensive collection of family lineage)

ManSeongDaeDongBo

“The Comprehensive Collection of Family Lineage” is a valuable resource for Korean Genealogy.  The title of this record is ManSeongDaeDongBo (만성대동보, 萬姓大同譜).  Generally only prominent individuals and their lineages and clan founding ancestors were included.  It covers 361 Korean lineages and 136,000 people from about 100A.D. to 1933 A.D..  The record was published during the Japanese colonial period of Korean history.  It is basically a multi-family collection of family trees.  Cross-referencing this record with a family JokoBo provides additional insights and validation to Korean genealogy research.  The record is divided into 3 volumes.

 

 

 

 

 

 

The scanned record is available on Family Search, which is one of the most valuable sources for scanned Korean genealogy records:
Record Information
Scanned Record Images Online

The scanned record is also available online from the Academy of Korean Studies’ Center for Korean Studies Memorials:
Record Information
Scanned Record Images Online

Professor Mark Peterson provided an amazing tutorial on how to use this record:
YouTube Tutorial from RootsTech 2021

Some incredible projects have indexed the record and made it searchable!
Joseon Dynasty Family Relationship Network Information System (LNIS)

Record History: history

Here is an index showing the clans (surnames with origin) and page numbers for families in the ManSeongDaeDongBo: ManSeungBo Clan Index