Automatic simplification of Swedish text

Time: Fri 2020-05-08 15.15

Location: Fantum (Lindstedtsvägen 24, floor 5, room no. 522)

Lecturer: Roman Priscepov

In this thesis, I describe my text simplification system Sweasy, which
makes textual information intended for native Swedish speakers more
accessible to second language learners of Swedish. Collaterally, other
groups with reduced literacy in Swedish might also benefit from the system.

Sweasy transforms text in three stages: syntactic simplification,
lexical simplification and text summarization. In the simplification
process, the system strives to preserve the information content and
cohesion of the original text.

The objective of the syntactic simplification is to reduce grammatical
complexity of a text by hand-crafted simplification rules. The rules are
implemented as discrete transformations that, for example, split long
sentences with multiple main and subordinate clauses, transform passive
voice to active or indirect speech to direct.

At the lexical transformation stage, complex words are replaced by their
simpler synonyms. Which words become candidates for simplification is
determined by the learner’s language level and the frequency of a word
in a large general corpus. Ultimately, the substitute is the word
semantically closest to the original word according to the similarity of
their corresponding word vectors.

At the last stage, the text is summarized by the TextRank summarization

The system is evaluated by both automatic and qualitative measures. The
SVIT model (Mühlenbock, 2013) estimates text complexity according to a
comprehensive set of quantitative features at different levels: surface,
vocabulary, syntactic and idea density measured by the nominal ratio.
For the qualitative part of the evaluation, a group of KTH students
enrolled in the Swedish language course filled out a questionnaire about
their comprehension of two texts, one in the original form and one
transformed by the text simplification system.

The results show that over 80% of simplified sentences were correct in
regard to grammaticality, meaning preservation and cohesion. Thanks to
simplification, the sentences became 20% shorter and the number of
subordinate clauses decreased by almost 30%. The readability measured by
the LIX formula improved by 10%. The survey of Swedish learners
confirmed that texts simplified by Sweasy are more readable and