A Turkish Poetry Generetor AI That Passes Turing Test

27 June 2016

As a last semester student of Computer Engineering department I decided to work on a natural language processing project which generates human-like poetic texts in Turkish.

The goal of this project is creating a computer program that can generate poems which are indistinguishable from human-written poems. An experiment is made with 146 participants to test if our automatic poetry generation program ROMTU is able to achieve this goal. As a result,ROMTU were able to mislead 48.63% of participants.

I will give a short information about the project in this blog post. For further reading I will share link of my full paper and source codes of the program.

Methodology

For creating a lexicon, ﬁrstly 1500 diﬀerent poems are gathered from siirakademisi.com website. After than the most commonly used words are selected. Since Turkish is a agglutinative language, stem of the words are found in order to use them with diﬀerent grammatical tenses. As a result, a lexicon with 4245 stem words is made. For achieving the meaningfulness property, the words in lexicon divided into the categories according to their model and meaning.

For satisfying meaningfulness property, a trained data ﬁle which provides similarity relationships of words with their cosine distance were needed. A corpus which includes 500M tokens is used as a data to be trained. This corpus is gathered from Turkish news portals and web pages in Turkish. Word2vec tool is used for generating vectorised representation of our corpus.

Predeﬁned patterns are used for poem generation process.

Each pattern starts with a pattern head:<pattern number=(number) theme=(theme name)>

For example:<pattern number=1 theme=love>

Each pattern ends with following line:</pattern>

Experiments

The test is done with 146 participants physically. All participants were engineering students of Istanbul Bilgi University. Participants were fully aware of the purpose of the test.

The experiment consists of 6 questions. In each question, a computer-generated poem and a human-written poem from a famous Turkish poet are given to participants. Participants were asked to detect human-written poems in every question. Also, participants ranked each poem from 0 to 5 (0:weak, 5:strong)according to following criteria:

•Rhyme

•Message

•Usage of Language

Results

According to figure, 51.37%of the participants are successfully identiﬁed human-written poem while 48.63%
of them are failed. As a result, we can say that poems which are generated by our algorithm are nearly indistinguishable from human-written poems.

Figure shows the total results of diﬀerent groups of participants. In total result, we don’t see too much success diﬀerence between men and women participants. Women participants are 2.3% more successful on identifying human-written poems.

We also can’t see too much success diﬀerence between the participants who claim they are interested in poetry or not. The diﬀerence is 1.54%.According to this result we can say that interest in poetry is not so eﬀective on identifying human-written poem from a computer-generated poem.

According to total result, computer engineering students failed to identify human-written poem with 7.6% diﬀerence than the other engineering students.

Figure shows the total ranking of the computer-generated poems by their features. It has 3.28, 3.25, 3.39 points out of 5 on rhyme, message and usage of language features.

Links

Full Paper: Automatic Poetry Generation in Turkish

Source Codes: https://github.com/utkusen/romtu