Сборник 2006

Содержание

Титульный лист

УДК 80/81; 004
ББК 81.1
К63

Программный комитет конференции выражает искреннюю благодарность
Российскому фонду фундаментальных исследований
за регулярную финансовую поддержку,
грант № 06-01-10020-г

Компьютерная лингвистика и интеллектуальные технологии: Труды международной конференции
«Диалог 2006» (Бекасово, 31 мая – 4 июня 2006 г.) / Под ред. Н.И. Лауфер, А. С. Нариньяни, В. П.
Селегея. – М.: Изд-во РГГУ, 2006. 648 с: илл.

Труды международной конференции по компьютерной лингвистике и интеллектуальным
технологиям «Диалог 2006» включают 110 докладов, представляющих широкий спектр теоретических и
прикладных исследований в области описания естественного языка, моделирования языковых
процессов, создания практически применимых компьютерных лингвистических технологий.

Для специалистов в области теоретический и прикладной лингвистики и интеллектуальных
технологий.

Статьи

Андреева Е.Г.

Анализ переводческих соответствий на материале параллельного корпуса текстов (p. 26)

Апресян В.Ю.

Семантика и прагматика судьбы (p. 31)

Азарова И.В., Марина А.А.

Автоматизированная классификация контекстов при подготовке данных для компьютерного тезауруса RussNet (p. 13)

Азарова И.В., , Иванов В.Л., Овчинникова Е.А

Использование схемы наследования рамок валентностей в тезаурусе RussNet для автоматического анализа текста (p. 18)

Баглей С.Г., Антонов А. В., Мешков В. С., Суханов А.В.

Кластеризация документов с использованием метаинформации (p. 38)

Баранов А.Н.

Намек как способ косвенной передачи смысла (p. 46)

Баталина А.М., Епифанов М.Е., Кобзарева Т.Ю., Кушнарёва Е.В., Лахути Д.Г.

Опыт экспериментальной реализации алгоритмов поверхностно синтаксического анализа (p. 51)

Беликов В.И.

Словарь «Языки русских городов»: подбор примеров и Интернет (p. 57)

Белоозеров В.Н., Антошкова О.А., Шапкин А.В.

Классификационная среда для систематизации и поиска информационных ресурсов по науке и технике (p. 61)

Берзинь А.С.

Измерение фономорфолексического расстояния между латышскими наречиями путём применения расстояния вагнера-фишера (p. 65)

Богатырева И.Н., Антонов А.В., Курзинер Е.С.

Гипертекст, контекст и подтекст в поисково-аналитической системе “галактика-zoom” (p. 73)

Большакова Е.И., Большаков И.А, Котляров А.П

Расширенный эксперимент по автоматическому обнаружению и исправлению русских малапропизмов (p. 78)

Борисова Е.Г.

Интерактивный подход в лингвистике: пределы применимости (p. 84)

Браславский П.И., Соколов Е

Сравнение четырех методов автоматического извлечения двухсловных терминов из текста (p. 88)

Дебренн М.

Место межъязыковой девиатологии в общей теории ошибок (p. 133)

Добров Б.В., Лукашевич Н.В

Онтологии для автоматической обработки текстов: описание понятий и лексических значений (p. 138)

Драгой О.В.

Влияние объема оперативной памяти на интерпретацию предложений с тремя возможными вершинами придаточного (p. 143)

Фаустова Н.А.

Сопоставительный анализ английской, французской и русской интонации (p. 527)

Федоровский А.Н., Костин М.Ю.

Методы ранжирования в полнотекстовом поиске по коллекции html-документов (p. 534)

Галина И.В.

Моделирование трансформаций номинативных структур при решении задач французско-русского машинного перевода (p. 105)

Гельбух А.Ф., Сидоров Г.О., Вера-Феликс А.

Словари в задачах автоматической обработки пар переводных текстов (p. 110)

Гладун В.П., Величко В.Ю., Святогор Л.А.

Тематический анализ естественно языковых текстов (p. 115)

Григорян Л.А.

Автоматическое порождение структуры по названию химического соединения (p. 119)

Грудева Е.В.

Способы введения темы в русском языке (p. 124)

Грунтова Е.С.

Регулярные модели управления русских приставочных дериватов (p. 129)

Guenthner Franz

Local Grammars in Corpus Calculus (p. 616)

Иомдин Б.Л., Бердичесвкий А.С.

А кто этот этот? Имена собственные и неопределенная определенность (p. 196)

Иомдин Л.Л.

Многозначные синтаксические фраземы: между лексикой и синтаксисом (p. 202)

Иваненко Г.С.

Определение информационной природы конфликтного высказывания (p. 191)

Канович М.И., Шаляпина З.М.

Аппарат r-отсылок как универсальное средство синтаксического синтеза (на опыте разработки системы русского синтеза RussLan) (p. 207)

Киселёв В.В., Таланов А.О., Тампель И.Б., Татарникова М.Ю., Хохлов Ю.Ю.

Автоматический поиск ключевых слов в непрерывном потоке речи на основе технологии «распознавание через синтез» (p. 214)

Кнеллер Э.Г.

Анализ параметров речевого сигнала создающих восприятие элементарных звуков речи (p. 220)

Кобзарева Т.Ю.

Рекурсивность и проективность сочинительных связей в русском тексте (p. 223)

Кодзасов С.В., А. В. Архипов, А. А. Бонч-Осмоловская, Л. М. Захаров, О. Ф. Кривнова

База данных «интонация русского диалога»: побудительные реплики (p. 236)

Койт М.Э.

Конверзационный агент в информационно-справочном диалоге (p. 269)

Кондрашова Д.С.

Теория сегментной репрезентации дискурса для решения задач судебной лингвистической экспертизы при извлечении из текста имплицитной информации (p. 275)

Копотев М.В., Гурин Г.Б.

Принципы синтаксической разметки хельсинкского аннотированного корпуса русских текстов ханко (p. 280)

Коваль С.Л., Прощина Е.А.

Модель коммуникативного акта в прикладных задачах речеведения (p. 230)

Козеренко А.Д.

Идиомы семантического поля важность – неважность в русском языке (p. 248)

Кожунова О.С.

Применение правдоподобных рассуждений дсм – метода для пополнения семантического словаря (p. 243)

Козлов М.В., Яцко В.А.

Метод оценки эффективности функционирования современных информационно-поисковых систем Интернета (p. 259)

Козьмин А.В.

Автоматический анализ стиха в системе Starlng (p. 265)

Крейдлин Г.Е.

Механизмы взаимодействия невербальных и вербальных единиц в диалоге I. Жестовые ударения (p. 290)

Крылов С.А., С. А. Старостин

Интегрированная информационная среда STARLING и её использование в сфере корпусной лингвистики (p. 303)

Крижановский А.А

Автоматизированное построение списков семантически близких слов на основе рейтинга текстов в корпусе с гиперссылками и категориями (p. 297)

Кустова Г.И.

Валентности и конструкции прилагательных (p. 323)

Кузнецова А.И.

Какие факторы влияют на структуру дискурса? (по материалам переводов иноязычных текстов на русский язык) (p. 308)

Кузнецова Е.В

Природа и функции побочного ударения в русском языке (p. 313)

Кузнецов И.П., Мацкевич А.Г.

Семантико-ориентированный лингвистический процессор для автоматической формализации автобиографических данных (p. 317)

Котов А.А.

Модель эмоционального речевого поведения для виртуального агента ролевой компьютерной игры (p. 285)

Козеренко Е.Б.

Проблема эквивалентности языковых структур (p. 252)

Ландэ Д.В., Григорьев А.Н.

Многоуровневый классификатор-навигатор по откликам информационно-поисковой системы (p. 329)

Ландэ Д.В., Григорьев А.Н., Брайчевский С.М.

Стабильность источников как один из параметров информационных потоков (p. 332)

Леонтьев А. П. , Леонтьева A. Л.

Еще раз к вопросу о семантике генитивных отношений (p. 335)

Леонтьева Н.Н.

О грамматике концептуальных отношений (p. 342)

Летучий А.Б.

Лабильность в русском языке: случайность или закономерность? (p. 343)

Липатов А.А., Мальцев А.А

Методы автоматизации построения и пополнения двуязычных словарей с использованием корпусов параллельных текстов (p. 348)

Литвиненко А.О.

Стратегии оформления чужой речи в устном детском нарративе (p. 353)

Лобанов Б.М., Пьорковска Б., Рафалко Я., Цирульник Л.И., Шпилевский Э.

Фонетико-акустическая база данных для многоязычного синтеза речи по тексту на славянских языках (p. 357)

Макаров М.Л., Школовая М.С.

Лингвистические и семиотические аспекты конструирования идентичности в электронной коммуникации (p. 364)

Малинина К.О., Шапкин А.В.

Механизмы оснащения рубрикатора ВИНИТИ ключевыми словами (p. 370)

Марушкина А.С.

«Наивная механика» в языке и онтологии (p. 375)

Михайлов М.Н.

Структура и содержание лексических баз данных для обучающей программы по иностранным языкам (p. 389)

Михеев М.Ю., Добровольский Д.О.

Стратегии перевода и остранение в художественных текстах (p. 394)

Митрофанова О.А. Крылов С.А.

Типовой” контекст: случайность или закономерность? (p. 382)

Невзорова О.А., Зинькина Ю. В., Пяткин Н. В.

Метод контекстного разрешения функциональной омонимии: анализ применимости (p. 399)

Oja Anni

Finding an identity, designing the identity: study of web-based communication (p. 611)

Падучева Е.В.

Наблюдатель: типология и возможные трактовки (p. 403)

Пазельская А.Г.

Русские предикатные имена и отрицание (p. 414)

Перцов Н.В.

К проблеме построения семантического метаязыка (p. 419)

Петров А.А.

Особенности сетевого англоязычного лингвистического процессора для формализации текстовой информации на естественном языке (p. 426)

Подлесская В.И., Хуршудян В.Г.

О лексических маркерах хезитации в спонтанной речи: уроки армянского (p. 429)

Попова Т.И.

Повтор как средство координации речевого поведения собеседников в официальном публичном диалоге (p. 440)

Рахилина Е.В., Кобрицов Б. П., Кустова Г. И., Ляшевская О. Н., Шеманаева О. Ю.

Многозначность как прикладная проблема: лексико-семантическая разметка в национальном корпусе русского языка (p. 445)

Raskin Viktor

The Whys and Hows of Ontological Semantics (p. 621)

Розина Р.И.

Отношения производности в синхронии и диахронии (на материале современного русского сленга) (p. 451)

Рубашкин В.Ш., Чуприн Б.Ю.

Распознавание количественной информации в ЕЯ-текстах (p. 456)

Саломатина Н.В., Гусев В.Д.

Автоматизация формирования индикаторных словарей и возможности их использования (p. 459)

Сидоров Г.О., Кастро-Санчес Н.

Система для лингвистической оценки психологических профилей (p. 464)

Сидорова Е.А., Загорулько Ю.А., Кононенко И.С.

Семантический подход к анализу документов на основе онтологии предметной области (p. 468)

Соколова Е.Г., Болдасов М.В.

Принципы построения семантических аннотаций содержания изображений (p. 474)

Старостин А.С., Мальковский М.Г.

Модель синтаксиса в системе морфосинтаксического анализа «Treeton» (p. 481)

Шаронов И.А.

О новом подходе к классификации эмоциональных междометий (p. 561)

Шеманаева О.Ю.

Точные и приблизительные оценки размеров предметов в русском языке (p. 567)

Шмелева Е.Я., Шмелев А.Д.

Интертекстуальные фрагменты в современном русском анекдоте (p. 573)

Секерина И.А.

Использование метода записи движений глаз при изучении двуязычия (p. 607)

Тер-Аванесова А.В., Крылов С.А.

Лексико-грамматические базы данных как инструмент диалектологического описания (p. 493)

Токарева М.Ю., Большакова Е. И., Бордаченкова Е. А

Автоматическая генерация спортивного комментария (p. 498)

Толпегин П.В., Ветров Д.П., Кропотов Д.А.

Алгоритм автоматизированного разрешения анафоры местоимений третьего лица на основе методов машинного обучения (p. 504)

Цирульник Л.И., Лобанов Б.М.

Экспериментальная оценка вклада элементов компиляции в правдоподобие синтезированного речевого клона (p. 545)

Цуканова В.Л.

Опыт применения методов дискурсивно ориентированной транскрипции к материалу неиндоевропейского языка (p. 552)

Тузов В.А.

Семантика предложно-падежных форм русского языка (p. 513)

Тузовский А.Ф., Козлов С.В.

Построение модели знаний организации с использованием системы онтологий (p. 508)

Урысон Е.В.

Подсистема русских сочинительных союзов и, а, но (p. 519)

Виноградова Н.В.

Контактоустанавливающая функция русского компьютерного жаргона (p. 95)

Воскресенский А.Л., Хахалин Г.К.

Средства семантического поиска (p. 100)

Ягунова Е.В.

Мелодические признаки и опорные элементы при восприятии текста (p. 583)

Янко Т.Е.

Интонация связного текста (p. 591)

Янович И.С.

Два какой в русском языке (p. 597)

Янович И.С., Федорова О.В.

Анализ речевых ошибок при предикативном согласовании в русском языке: эффект рода главного имени (p. 602)

Юдина М.В.

Понимание и порождение высказываний с синтаксической неоднозначностью (на примере относительных придаточных в русском языке) (p. 578)

Загорулько Ю.А., Боровикова О.И., Кононенко И.С., Сидорова Е.А.

Подход к построению предметной онтологии для портала знаний по компьютерной лингвистике (p. 148)

Захаров Л.М., Казакевич О.А.

К вопросу о границах предложения в устных текстах на языке без устойчивой письменной традиции (p. 168)

Зализняк А.А.

Русские культурные концепты в европейской лингвистической перспективе: слово проблема (p. 152)

Зализняк А.А., Микаэлян И.Л.

Переписка по электронной почте как лингвистический объект (p. 157)

Зарецкая Е.Н.

Логико-психологическая структура дискуссии (p. 163)

Зацман И.М.

Полидоменные модели в системах оценки инновационного потенциала и результативности научных исследований (p. 178)

Зевахина Н.А.

Немецкие сложные прилагательные в словаре и в дискурсе (p. 184)

Циммерлинг А.В.

Отношение свободного порядка слов и модели инверсии (p. 540)

Аннотации

Andreyeva E. G. Saint-Petersburg State University

A CORPUS-BASED ANALYSIS OF RUSSIAN-ENGLISH LEXICAL CORRESPONDENCES

The report based on the data of a parallel corpus is aimed at revealing and analyzing the main correspondence patterns between the Russian concept of “dusha” and its English equivalents as well as translation techniques used both in Russian-English and English-Russian renditions of fiction.

Apresyan V.Yu. Institute of the Russian Language RAS

SEMANTICS AND PRAGMATICS OF FATE

The paper examines Russian words and expressions which are used to speak about external forces affecting events and situations – such as sud’ba, promysel, providenie, rok, ne suzdeno, ne sud’ba etc. They are compared to their English counterparts and certain parallels are drawn. Russian concept of ne sud’ba is proven language-specific on the basis of linguistic criteria.

Azarova I.V. Saint-Petersburg State University Marina A. S. Saint-Petersburg State University

AUTOMATIC CONTEXT CLUSTERING FOR COMPUTER THESAURUS RUSSNET

While constructing the computer thesaurus RussNet, valency frames are specified for lexicon units. The attributes of valencies provide the capacity to distinguish thesaurus synonymic sets and to disambiguate analyses in the text parser. Valency frame features are based on the statistically steady context markers accompanying realisation of some lexical meaning in the text corpus. These features are morphologic, syntactic, and semantic. The automatic classification of corpus samples with unambiguous morphology annotation is discussed in the paper. The rough sorting of word contexts into lexical groups, i.e. semantic trees of RussNet thesaurus, is a pre-processing stage facilitating valency frame specification. The described procedure is fulfilled by means of morphology tag distribution in the context “window” for lemmas from particular trees and their gathering into distinguishable clusters. The preliminary results are to be considered.

Azarova I.V. Saint-Petersburg State University «»Ivanov V. L. Ideograph company «»Ovchinnikova E. A. Ideograph company

RUSSNET VALENCY FRAME INHERITANCE IN AUTOMATIC TEXT PROCESSING

The automatic text processing system IDEOGRAPH is presented. It involves the formal grammar description of Russian (Rus4IR) and computer thesaurus RussNet. The special extension of RussNet, valency frames, is used for syntactic & lexical disambiguation. These frames comprise description of context markers, which are statistically consistent in the text corpus. The text fragments are interpreted in terms of proposition structures with core component – predicate with subject & object arguments referred to synonymic set ids in RussNet associated by hyponymy links into semantic trees. The inheritance of valency frame attributes is described concerning the structure of three semantic trees. This device may be used for phrase analysis specification, ranking of output structures, argument unification in inference.

Bagley S.G., Antonov A.V., Meshkov V.S., Sukhanov A.V. “Galaktika Corporation”, Moscow

DOCUMENT CLUSTERING USING METADATA

A new approach applied to document clustering is described in the paper. Modified LSA/LSI algorithm underlies our clustering method implemented in «Galaktika-Zoom» search and analysis system. The main problem being solved by means of approach presented in this paper is to separate document corpus into groups (clusters) on base of topic similarity, i.e., on the similarity of its’ feature vectors. In contrast to the traditional LSA implementation, base units for clustering process are words and word combination sets (information portraits) preliminary selected on statistic base. Elements of information portraits are lingual invariants, which statistically distinguish document sample.

Anatoly Baranov Institute of the Russian Language RAS

HINT AS AN INSTRUMENT OF INDIRECT COMMUNICATION

Important semantic and pragmatic features of the hint as a language phenomenon are considered. For hinting at something speaker can use both linguistic forms and non-verbal actions with non-standard semantics. It is necessary to distinguish between genuine hint and regular hint. In opposite to indirect speech acts, using of hints presupposes an implicit way of communication.

Batalina A.M., Epifanov M.E., Kobzareva T.J., Kushnareva E.V., Lakhuti D.G. Moscow, RSUH

THE EXPERIENCE OF A SAMPLE IMPLEMENTATION OF SURFACE-SYNTACTIC ANALYSIS ALGORITHMS

The paper discusses the application of an instrumental environment for experiments with surface-syntactic analysis algorithms. A rapid debugging and implementation practice of a set of algorithms of surface-syntactic analysis in this environment is described.

Belikov Vladimir I.

THE EXAMPLES FOR THE DICTIONARY OF THE VARIETIES OS URBAN RUSSIAN AND THE WWW

The article concerns the typology and functions of the illustrations in the traditional Russian explanatory dictionaries and the role of the WWW in the selection of the illustrative examples for the Dictionary the varieties of urban Russian.

Beloozerov V. N., Antoshkova O. A., Shapkin A. V.All-Russia Institute for Scientific and Technological Information – VINITI

CLASSIFICATION MEDIUM FOR SYSTEMIZING AND RETRIEVING SCIENTIFIC ANS TECHNOLOGUCAL INFORMAYION RESOURCES

The development of a navigation system for information resources search on the basis of mutual mapping of their classification systems is described. A database is generated which contains all classes of seven wide-spread classifications. Tools for establishing semantic interconnections of classes have been developed and implemented.

Berzin A.U. Latvian University, Riga

MEASUREMENT OF PHONOMORPHOLEXICAL DISTANCE BETWEEN LATVIAN DIALECTS USING WAGNER-FISCHER DISTANCE

The paper describes an attempt to calculate phonetical, morphological and lexical distances between Latvian dialects. An experiment using Levenshtein distance is followed by one with Wagner-Fischer distance. The results are compared, allowing for some important concusions.

Bogatyreva I.I., Antonov A.V., Kurziner E.S. “Galaktika Corporation”, Moscow

HYPERTEXT, CONTEXT AND SUBTEXT IN SEARCH ENGINE “GALAKTIKA-ZOOM”

The analytical search engine “Galaktika-ZOOM” provides automated extraction of key words from textual data, the so called “information portrait”. Algorithmically, an infoportrait represents words and word combinations which are characteristic of the query text. In fact the infoportrait is the query’s paradigmatic context, or the sample’s hypertext. Inside this information paradigm one can define sense syntagms which do not exist in syntagmatic context, which we refer to as subtext.

Bolshakova E.I., Bolshakov I.A., Kotlyarov A.P.

AN EXTENDED EXPERIMENT ON AUTOMATIC DETECTION AND CORRECTION OF RUSSIAN MALAPROPISMS

Malapropism is a semantic error that replaces one content word with another one close in sound but having a different meaning. The paper discusses the results of an extended experiment that tests the earlier proposed method of malapropism detection and correction based on Internet statistics and a numerical Index of Semantic Compatibility.

Borisova E.G.

THE INTERACTIVE APPROACH IN LINGUISTICS: LIMITS OF APPLICATION

The interactive approach makes part of the dynamic models of speech. It is based on the pragmatic principles and makes it possible to take into consideration the activity of both the Speaker and the Hearer while choosing proper words and forms. It is supposed that the Speaker imagines the way the Hearer can understand various variants that can express the necessary sense and chooses those that are the most easy for understanding. The paper shows that the approach can be useful for describing rules of usage for some synonyms or such grammar categories as the Russian aspect, etc. Still linguists should turn to it rather seldom.

Braslavski P., Sokolov E.

COMPARISON OF FOUR METHODS FOR AUTOMATIC TWO-WORD TERM EXTRACTION

The paper describes four methods for automatic two-word term extraction from raw text based on occurrence frequencies and morphological templates. The paper reports on the results of the methods applied to texts from two different domains. A combined evaluation methodology is proposed; comparative evaluation results are provided.

Debrenne M.

THE PLACE OF INTERLANGUAGE DEVIATOLOGY IN GENERAL THEORY OF ERRORS

Deviatology is defined as a cognitive science that deals with deliberate and unwilling deviations of the norm within a wide field of human activity. Language deviatology is part of general deviatology, which includes the study of: * planned deviations of the norm, such as neologisms, jokes, stylistic tropes * non-planned deviations, such as slips of the tongue, lapses and speech errors. This classification of deviations applies equally to errors in the mother tongue as to errors in foreign languages. Speech errors in the mother tongue, which, theoretically, should not occur at all, are the object of stylistics while the errors in foreign languages, where there are few occurrences of conscious deviation of the norm, are studied in interlanguage deviatology.

Boris Dobrov, Natalia Loukachevitch

ONTOLOGIES FOR NATURAL LANGUAGE PROCESSING: description of concepts and lexical senses

The problem of the relation between concepts and lexical senses became very practical for the development of ontologies intended for natural language processing. In the paper we consider the existing approaches to description of concepts and senses in various ontologies.

Olga V. Dragoy, Moscow State University

THE ROLE OF THE WORKING MEMORY IN RELATIVE CLAUSE ATTACHMENT PREFERENCE IN A THREE-SITE CONTEXT

Experimental data presented in this study shows that individual differences in working memory can account for variance in relative clause attachment preference in a three-site context. We discuss how parsing strategies can be affected by working memory constraints.

Faustova N.A.

COMPARATIVE ANALYSIS OF ENGLISH, FRENCH AND RUSSIAN INTONATION

The рареr presents contrastive analysis of English, French and Russian intonation. Phonological markers of focus, topic, contrast and emotional emphasis are discussed. The analysis of the three languages reveals similar and different intonation patterns.

Fedorovsky A.N., Kostin M. Yu. Mail.Ru, Moscow

FULL-TEXT SEARCH RANKING METHODS IN HTML DOCUMENT COLLECTION

The paper describes webpage ranking algorithms based on page content, which are used in relevance counting in system Search@Mail.Ru. Their effectiveness has been tested experimentally, results are given. The feasibility of these algorithms being used in building full-scale text Web search systems is considered.

Galina I.V. Institute for Informatics Problems, RAS

MODELLING OF TRANSFORMATIONS OF NOMINATIVE STRUCTURES for the DECISION OF PROBLEMS OF FRENCH-RUSSIAN MACHINE TRANSLATION

Issues of construction of functional semantic models for the transformations of nominative structures within the framework for decision of problems of French-Russian (and Russian-French) machine translation are considered . The analysis of structures and the block of multiple logical semantic rules are being developed with the account of functional similarity and syntactic polysemy for nominative constructions on the material of the focal sample of parallel texts in the Russian and French languages. The problem of meaning transfer is decided on the basis of analysis of cognitive structures. The modelling is conducted as part of the project on creation of a multilingual linguistic processor on the basis of functional semantic approach.

Alexander Gelbukh Grigori Sidorov Jose Angel Vera-Felix

DICTIONARIES IN TASKS OF AUTOMATIC PROCESSING OF PAIRS OF TRANSLATED TEXTS

Aligned parallel corpora are very important linguistic resources that help in many computational linguistic tasks such as machine translation, automatic dictionary compilation, linguistic machine learning, etc. Nevertheless, there are very few available linguistic resources of this type, especially for fiction texts, due to the difficulties of getting the texts and the high cost of alignment. In this paper, we describe an English-Spanish parallel corpus compiled of fiction texts and an evaluation of how a method of alignment based on linguistic data, namely, on the usage of bilingual dictionaries for calculation of the similarity, performs for fiction texts. The basic idea of the method is that if a meaningful word is present in the source text, then one of its dictionary translations should be present in the target text. Experimental results of alignment at paragraph level are given. The results show that this type of methods is applicable for fiction texts as well.

Gladun V.P., Velichko V.U., Svyatogor L.A.

THEMATIC ANALYSIS OF NATURAL LANGUAGE TEXTS

Some applications of natural language texts need such form of text representation that is a result of a reasonable compromise between the wish to make the text shorter, saving its fundamental thematic purposes, and the wish to retell the source text in more detail. Some degree of this compromise should be achieved at text abstracting when creation of different storage of textual information, for example archives, personal libraries and so on. The paper discusses possible ways of achieving this compromise. The method has been implemented in the KONSPEKT software system.

Grigoryan L.A. All-Russia Institute for Scientific and Technological Information – VINITI

«STRUCTURE-BY-NAME» AUTOMATIC GENERATION FOR CHEMICAL COMPOUNDS

A new version of the Nomenclature Analyzer software is presented. The software translates the systematic names of chemical compounds, given in the IUPAC nomenclature, into molecular graphs. The algorithm is based on the morphemic segmentation of the compound names into chemically meaningful components- morphemes.

Grudeva E.V. Saint-Petersburg State University

WAYS OF INTRODUCING THE TOPIC IN RUSSIAN

The paper is devoted to the constructions of the topic in the Nominative case and those of chto kasaetsa and chto do types. Theoretical research in this field as well as the data of corpora research are taken into consideration. The results of psycholinguistic experiments show that even the smallest-scale formalization of the topic concerned considerably increases the agreement between the answers of the examinees in its identification.

Gruntova E.S.

REGULAR GOVERNMENT PATTERNS OF RUSSIAN PREFIXAL DERIVATES

The paper examines the differences between government patterns of Russian prefixal derivates and their non-prefixed counterparts. The paper discusses issues of causes triggering the changes of government patterns in prefixal derivation and offers a hypothesis that partly explains the transformation

Franz Guenthner Center for Information and Language Processing Ludwig-Maximilian University, Munich, Germany

Local Grammars in Corpus Calculus

An approach to linguistic analysis is presented that assumes that the description of sentences should be viewed as a demonstration of how they can act as variations of previously produced sentences. Five principles play a central role in such demonstrations: substitution of arguments, permutation, lexical functions, grammatical functions and predicate-argument schema identity. Among the conclusions we draw is the observation that a grammar is not so much a system of grammatical rules (of the phrase structure variety) but rather a set of operations that allow us to relate arbitrary sentences to other sentences and ultimately a set of «elementary» predicate-argument structures.

Iomdin B.L,., Berdichevsky A.S.

WHAT IS THIS THIS? PROPER NAMES and INDEFINITE DEFINITENESS

In Russian, combinations of the demonstrative pronoun этот ‘this’ with proper nouns seem interesting, since this determinant marks lower definiteness of a completely definite referent of the noun in the speaker’s world. Semantic and pragmatic properties of this construction are discussed.

Iomdin L.L.

POLYSEMOUS SYNTACTIC IDIOMS: BETWEEN THE VOCABULARY AND THE SYNTAX

An analysis is offered of syntactic properties of the Russian polysemous idiom ВСЁ РАВНО:всё равно 1 » ‘all the same’; as in Я всё равно сижу дома‘I am staying at home all the same’;все равно 2 » ‘makes no difference’, as in Нам всё равно, куда ехать‘We don’t care where we’ll be going’ and всё равно 3 » ‘tantamount’; as in Жаловаться на народ – всё равно что на климат ‘To complain about one’s people is equivalent to raving about the climate’.

Ivanenko G.S.

THE NOTION OF THE COMMUNICATORY TYPE OF CONFLICT UTTERANCE.

In the structure of linguistic expertise dealing with the text, a key role is played by the notion of the communicatory utterance. The need for distinguishing between facts and evaluative data has shaped the author’s view of the modality as a pragmatic category.

Kanovich M.I., Shalyapina Z.M., The RAS Institute of Oriental Studies, Moscow

THE FORMALISM OF R-ATTRIBUTES AS A UNIVERSAL MEANS OF SYNTACTIC GENERATION (based on its implementation in the RussLan system of Russian generation)

The formalism of R-attributes permits representing and processing structural relations and linguistic rules associated with them as relational attributes of the entities they are relevant to. It is efficient for a wide range of syntactic generation problems, from computing valency models for occasional lexemes to lexico-syntactic transformations.

V.Kiselov, A.Talanov, I.Tampel., M.Tatarnikova,Y.Khokhlov, “Speech Technology Center”, Saint-Petersburg

AUTOMATIC KEYWORD SPOTTING IN CONTINUOUS SPEECH USING RECOGNITION-BY-SYNTHESIS TECHNIQUE

Automatic keyword spotting in continuous speech is of great importance for a number of applied tasks. Most of those are connected with security systems and phone services. The keyword spotting system based on dynamic programming and speech synthesis is presented. We use the one-pass method which secures both high rate of correct recognition and low level of false alarms.

Kneller E.G.

ANALYSIS OF THE VOICE PARAMETER SIGNAL ENABLING PERCEPTION OF ELEMENTARY SOUNDS

A new approach to initial signal processing of speech is presented. The approach enables the extraction and measurement of signal parameters responsible for the perception of sounds of speech.

Kobzareva T.Yu.

СO-ORDINATION RECURSIVENESS AND PROJECTIVITY IN THE RUSSIAN TEXT

Сo-ordination analysis is a required constituent of automatic syntactic analysis. We discuss sentence structure properties in conjunction reduction , i.e. co-ordination projectivity and recursiveness in Russian sentence structure. These properties are of great importance for analysis zones delimitation in the process of constructing co-ordinative and subordinating links of main and dependent clauses, dangling participles and other isolated sentence parts, during the ambiguity resolution of punctuation marks and coordinating conjunctions and during the segment graph construction.

Кодзасов С.В., А. В. Архипов, А. А. Бонч-Осмоловская, Л. М. Захаров, О. Ф. Кривнова

THE DATABASE ON INTONATION OF RUSSIAN DIALOGUE: COMMANDING PROPOSITIONS

Mare Koit, Tartu University

CONVERSATION AGENT IN INFORMATION DIALOGUE

A model of conversation agent is introduced which consists of several modules and implements various kinds of knowledge. Knowledge representation is considered, including determination of dialogue acts as frames, and regular expressions that represent the structure of information dialogue

Daria Kondrashova

SEGMENTED DISCOURSE REPRESENTATION THEORY FOR SOLUTION OF FORENSIC LINGUISTICS PROBLEMS IN IMPLICIT INFORMATION EXTRACTION

Forensic semantics as type of forensic linguistics (FL) is aimed at revealing senses in the given text and analyzing them from different points of view. We propose to use Segmented Theory of Discourse Representation [Asher, Lascarides 2003] to resolve problems of this type of FL.

Kotov A.A.

MODEL OF EMOTIONAL SPEECH BEHAVIOUR FOR A VIRTUAL AGENT OF A COMPUTER ROLE-PLAYING GAME

In a computer role-playing game a player is operating a virtual agent (game hero). During the game the player and his virtual hero are experiencing successes and faults. In the present issue we study a theoretical model to simulate speech behaviour of the virtual agent, which enables the production of possible utterances in different game situations. We study the selection of utterances from a database and semantic synthesis of utterances in emotional situations.

Kuznetsov I.P. Matskevich A.G.

SEMANTIC LINGUISTIC PROCESSOR FOR AUTOMATIC FORMALIZATION OF AUTOBIOGRAPHICAL DATA

Direct and reverse linguistic processors for autobiographical data (job requests, Curriculum Vitae) written as natural language texts are considered. In such a texts, a person provides information about himself or herself in a free form: first name, middle name, surname, birthday, address, time and place of education, job experience with its periods, positions, responsibilities etc. These data may be expressed by different ways. The objective of the direct linguistic processor is to extract the data, standardize them and linking the objects: organizations with dates, job positions etc. This activity underlies the construction of knowledge structures. The objective of the reverse linguistic processor is to present these structures as natural language units (such as word combinations and sentences) and to map them in the fields of a formalized questionnaire or a structured site.

Koval S.L., Proschina E.A.

COMMUNICATION ACT MODEL IN THE APPLIED TASCS OF SPEECH SCIENCE

A generalized communication act model is proposed, which includes a detailed description of all basic factors affecting the preparation and realization of verbal utterances. The model is aimed at the solution of applied problems of speakers’ identification and diagnostics by their speech, reconstruction of verbal activity circumstances, and authenticity validation of phonograms

Mikhail V. Kopotev, University of Helsinki, Finland Grigory B. Gurin, Petrozavodsk State University, Russia

PRINCIPLES OF THE SYNTACTIC ANNOTATION IN THE HELSINKI ANNOTATED CORPUS HANCO

Two alternative syntactic annotation schemes applied in the Helsinki annotated corpus of Russian texts HANCO are discussed in the presentation. Some problems arising during the application of one of them (viz. the traditional part-of-sentences doctrine) are discussed; some practical and theoretical deductions following from this experience are formulated.

Kozerenko E.B., Russian Language Institute RAS

IDIOMS OF THE SEMANTIC FIELD IMPORTANCE – UNIMPORTANCE IN RUSSIAN

The paper considers idioms of the semantic field IMPORTANCE – UNIMPORTANCE in Russian. Elements of meaning that are common for the whole semantic field, as well as those allowing a distinction between quasi-synonymic idioms are examined. The impact of the inner form of an idiom on its meaning is also considered. Definitions of some idioms of the semantic field are given. Statements on semantics of idioms are illustrated with plentiful examples of idiom usage in contemporary texts of various genres, as well as in conversation and on the Internet.

Kozhunova O.S.

JSM — METHOD PLAUSIBLE REASONING APPLICATION TO SEMANTIC DICTIONARY EXPANSION

The activity was aimed at designing and implementing the intelligence system of a semantic dictionary expansion prototype. The dictionary is expanded through learning by examples, the primary JSM method’s procedure. COM object (normalizes words in sentences) is applied to text processing.

Kozerenko E.B.

THE LANGUAGE STRUCTURES EQUIVALENCE PROBLEM IN TRANSLATION AND SEMANTIC ALIGNMENT OF PARALLEL TEXTS

The problem of language structures equivalence in the source text and the text of translation is considered. The main research objectives consist in working out the translation techniques for a number of basic language phenomena characteristic of scientific discourse, in creating correct algorithms for semantic alignment of parallel texts and machine translation. The studies are founded on the material of the Russian and English scientific periodicals. The emphasis is given to translation of impersonal and indefinite personal constructions of the Russian language into the English language, nonfinite verbal constructions of the English language into Russian and other structures most frequent in scientific texts. Translation of expressive means including metaphors is also considered

Kozlov M.V., Yatsko V.A.

A METHOD FOR EVALUATING CONTEMPORARY INTERNET INFORMATION RETRIEVAL SYSTEMS

The paper formulates principles of evaluation of contemporary Internet information retrieval systems. The results of testing of six information retrieval systems by the method of depth of user search are given.

Kozmin A.V.

AUTOMATED ANALYSIS OF VERSE WITH STARLING SOFTWARE PACKAGE

The paper is devoted to automated analysis of Russian verse with STARLING software package. The aim is to describe software tools and algorithms implemented in the system.

Kreydlin G.I. Russian State University for the Humanities

MEANS OF INTERACTION BETWEEN VERBAL AND NONVERBAL SIGN UNITS IN A DIALOG. PART I: BATONS

The academic lecture regarded as a kind of dialog is a suitable testing ground for the recognition of some peculiarities of gesture-speech interaction. Gesture strokes in lecturing organize the text, accentuate its units, represent some cognitive and psychological processes and thereby facilitate the audiences’ apprehension of the lecture.

Krylov S.A., Starostin S.A.

STARLING INTEGRATED INFORMATION ENVIRONMENT AND ITS USE FOR CORPUS RESEARCH

Tasks of corpus linguistics being solved in StarLing environment are: (1) converting a written text into a multi-level textual database (DB); (2) automatic and manual marking (tagging) of the DBs; (3) creating and correction of primary and secondary lexical DBs (supported by outer sources of data).

Krizhanovsky A.A.

AN AUTOMATIC CONSTRUCTION OF LISTS OF SEMANTICALLY RELATED TERMS BASED ON TEXTS RATING IN THE CORPUS WITH HYPERLINKS AND CATEGORIES

HITS adapted algorithm for synonym search, the program architecture, and the program work evaluation with test examples are presented. A program for the search of synonyms (and related terms) in a specifically structured text corpus (Wikipedia), Synarcher, was developed. Search results are presented in the form of a graph. It is possible to explore the graph and search graph elements interactively. The proposed algorithm could be applied to expand search requests and to compile synonym dictionaries.

Kustova G.I.

ARGUMENTS AND CONSTRUCTIONS OF ADJECTIVES

The paper deals with two types of adjective arguments and constructions. Arguments of the 1st type are common for the whole class of adjectives, arguments of the 2nd type are characteristic of concrete words.

Ariadna I. Kuznecova Moscow State University

WHICH FACTORS INFLUENCE THE DISCOURSE-REPRESENTATION STRUCTURE?

The paper deals with the language structure, genre and communication acts as factors which influence the discourse-representation structure. The analysis is based on Selkup and German texts (in Russian translations) and on Russian texts.

Kuznetsova E.V. Moscow State University

NATURE AND FUNCTIONS OF SECONDARY STRESS IN THE RUSSIAN LANGUAGE

Though discussed in various linguistic works, factors and conditions causing secondary stress (SS) in Russian are still not quite clear. Moreover, certain aspects of SS are interpreted by different linguists in quite contradictory ways. To avoid and understand such contradictions it is important to analyze nature and functions of SS in the Russian language.

Lande, D.V., Grigorjev A.N., ElVisti IC, Kyiv, Ukraine

A MULTILEVEL QUALIFIER-NAVIGATOR BASED ON INFORMATION RETRIEVAL SYSTEM RESPONSES

The paper describes the approach, model and implementation of a multilevel qualifier-navigator built on responses of a full-text information retrieval system . An interface enabling to make the inquiry more precise is proposed. The interface, implementing the principle of Custom Search Folders is designed on the basis of word affinity definition.

Lande, D.V., Grigorjev, A.N., Brajchevsky, S.M., ElVisti IC, Kyiv, Ukraine

STABILITY OF SOURCES AS ONE OF THE PARAMETERS OF INFORMATION STREAMS

The paper considers the stability of information sources, focussing on news websites. A formula and algorithm of computing of disorder level of information from a source is offered. Practical importance of this parameter is validated.

Leontyev A.P., Leontyeva A.L.

THE SEMANTICS OF GENITIVE RELATIONSHIPS REVISITED

This article presents a part of collective research of external possessor constructions in Russian. We claim that the use of this construction (there are about ten of them) is determined by a combination of factors. In this paper we analyze one of the most important factors namely the possessive relations (different semantic relations between possessor and possessee). Since there is no generally recognized classification of possessive relations we propose a new one based on a corpus research. We also present some important conclusions about the nature of possessive construction and semantics of genitive.

N.Leontyeva

ON A POSSIBLE GRAMMAR OF CONCEPTUAL RELATIONS

To face some domain knowledge when analysing natural text we need to build semantic representation (SemR) comparable with the given domain knowledge structures. Does it mean that linguistic analysis has to be different for any text specific for the given domain? Not necessarily. In our approach a transition from linguistic SemR to conceptual units and relations specific for the domain passes through binary semantic relations (SemRel). The intended grammar consists of the basic list of SemRels plus transition rules.

Letuchy A.B.

LABILITY IN RUSSIAN: AN EXCEPTION OR A RULE?

Russian labile verbs (verbs that can be both transitive and intransitive) are analyzed: I will show that, although labile lexemes are rare in Russian, it is possible to note certain regularities in their meaning. Besides that, I am analyzing the mechanisms which can make a verb labile.

Lipatov A.A., Maltsev A.A.

METHODS OF AUTOMATION OF CREATING AND EXPANDING BILINGUAL DICTIONARIES USING PARALLEL TEXT CORPORA

An approach to automate the creation of bilingual dictionaries is considered. This approach reuses work of translators: a bilingual corpus of parallel texts.

Litvinenko E.

SPEECH REPORTING STRATEGIES IN CHILDREN’S NIGHT DREAM STORIES

The paper is devoted to different strategies used by children while reporting someone’s thoughts and speech in narrative spoken discourse. The paper examines direct speech, indirect speech and some intermediate types of reported speech depending on syntax, grammar and intonation of those contexts in children’s night dream stories.

Boris Lobanov, Bozena Piorkowska, Janusz Rafalko, Liliya Tsirulnik, Edward Shpilewski

PHONETIC-ACOUSTICAL DATABASE FOR MULTILANGUAGE SLAVONIC TEXT-TO-SPEECH SYNTHESIS

The paper offers a typological analysis of the peculiarities of phonetic systems of Belorussian, Polish and Russian languages. The results of this study are used as basis for an approach to create a unified phonetic-acoustical database for Multilanguage Slavonic Text-to-Speech Synthesis. Principles of creating and processing text and speech corpora for each of the languages are described.

Makarov M.L., Shkolovaya M.S.

LINGUISTIC AND SEMIOTIC ASPECTS OF IDENTITY CONSTRUCTION IN E-COMMUNICATION

The phenomenon of personal identity construction in the Internet communication is approached. An elaborate analysis of marginal semiotic elements in e-communication is developed. The main speech strategies of identity presentation are highlighted and exemplified within the genres of “chat” and on-line diary, the so called “blog”.

Malinina K.O., Shapkin A.V.

FORMAL METHODS TO SUPPLY VINITI RUBRICATOR WITH KEYWORD SETS

Formal methods of creating keyword sets for VINITI rubrics are discussed, including structural representation of statistical data, normalization of terms, synchronization of keyword lists compiled by the different experts, development of DB structure. Subject description of rubrics and term clusterization may be useful in the construction of search thesaurus for scientific and technical issues.

Anastasia Marushkina

“NAIVE MECHANICS” IN LANGUAGE AND ONTOLOGY.

The paper is aimed at analyzing in ontological perspective the semantics of “naive mechanics” in Russian language. The research is focused around Force Dynamics theory introduced by L. Talmy and presenting a specific semantic category. This category, being a generalization over the traditional linguistic notion of “causation”, is seen to become a theoretical basis for building up a piece of lexical ontology.

Mikhailov M.N.

STRUCTURE AND CONTENT OF LEXICAL DATABASES FOR A FOREIGN LANGUAGE TEACHING SOFTWARE

Most of existing software for teaching foreign languages seems to be traditional exercises in computerized form. The aim of this paper is to show that a well-structured lexical database improves use and performance of teaching materials of this kind.

Mikheev M.Yu., Dobrovolsky D.O.

TRANSLATION STRATEGIES AND ESTRANGEMENT IN FICTION

This paper has two topics. First, the difficulty of translating Dostoevskij’s prose from Russian into German and consideration of the differing translation strategies thereby adopted. Second, the problems that arise when translating Platonov from Russian into French, German, and English. In both cases, the issue is the necessity to transmit authorial combinatorial deviations, i.e. «estranged» utterances, where the sense of various common expressions merge and meld together.

Mitrofanova O.A., Krylov S.A.

“PATTERN” CONTEXT: RANDOMNESS OR REGULARITY?

The paper offers discussion of “pattern” contexts exhibiting the use of lexemes in various meanings and combinatorial properties of lexemes. Special attention is drawn to comparative analysis of linguistic data presented in explanatory dictionaries and corpora of Russian. The results of the experiment allows elaborating procedures of syntagmatic analysis and semantic information extraction.

O.A. Nevzorova,J. V. Zin’kina, N.V.. Pjatkin

THE METHOD OF FUNCTIONAL HOMOHYMY DISAMBIGUATION ON THE BASIS OF CONTEXUAL RULES: A STUDY OF method applicability

The paper is devoted to a feasibility study of the method of functional homonymy disambiguation on the basis of contextual rules in Russian. The state-of the-art of lexicographical resources and complicated cases of functional homonymy disambiguation are among the topics discussed.

Anni Oja, Tartu University

Finding an identity, designing the identity: study of web-based communication

The paper presents an experimental research that uses a corpus of web-based communication portal www.rate.ee. This is the most popular website in Estonia, used by approximately one third of Estonian population. Users can present themselves through special personal webpages, rate each other’s pictures and create virtual social networks. Their motivation factors are communication and self-presentation with social feedback. Rate.ee environment supports different social actions, calculates the «fame» (popularity) of users etc. The author focuses on identity designing and language characteristics in this environment, that is to say: which markers and features can be used for promoting «virtual face» in the context of the web-based communication.

Paducheva E.V.

THE OBSERVER: TYPOLOGY AND POSSIBLE INTERPRETATIONS

There is a zero sign with deictic meaning which is called Observer and serves as the subject of secondary deixis. The Observer, as well as the speaker, has the right to identify objects, places and time points through their relation to himself and his present moment. Examples are given of verbs, adverbs, nouns and grammatical categories with semantics that presupposes the Observer.

Anna Pazelskaya

RUSSIAN PREDICATE NOUNS AND NEGATION

The paper discusses negation in Russian, expressed by the prefix ne- within deverbal nouns (e.g., nejavka ‘non-appearance’, nevmeshatel’stvo ‘non-intervention’). We identify three semantic types of negated nouns, depending on the aspectual properties of the negated event and the context in which the negated nominal occurs. Negation within deverbal nominals is in many substantial characteristics close to the typical verbal negation in Russian.

Pertsov N.V.

TOWARDS CONSTRUCTING SEMANTIC METALANGUAGE

The paper concerns the problem of tools which can be used in a semantic metalanguage designed for describing the semantics of natural language (NL). Is it necessary to base such semantic metalanguge on the natural language described – or one may rest upon some universal inventory of meanings? Some doubts are cast upon the thesis on the necessity of describing semantics of NL on the basis of the limited sublanguage of the NL described; this thesis is being upheld in a number of semantic theories. The semantic metalanguage may be built on universal meanings, and this possibility can be supported by the fact that even semantic metalanguges constructed on the base of sublanguges of NL cannot restrict themselves within limits of NL and include artificial elements. In the scientific apparatus for describing the surface levels of language structure – syntax, morphology and phonology – the specific character of language entities is not supposed to be mirrored by means of specific metalinguistic units oriented to the NL described.

Petrov A.A.

FEATURES OF A NETWORK ENGLISH LINGUISTIC PROCESSOR FOR FORMALIZATION OF TEXT INFORMATION IN A NATURAL LANGUAGE

The paper considers a linguistic processor for formalization of English text information in a natural language as a network component of an Internet project. Objectives of the linguistic processor, particularities of its English version, and network integration into the Internet portal are discussed.

Podlesskaya V.I., Khurshudyan V.G. Russian State University for the Humanities

HESITATION MARKERS IN SPONTANEOUS DISCOURSE: SOME EVIDENCE FROM ARMENIAN

Hesitation in Armenian can be expressed by a semantically bleached noun BAN ‘thing, deal, word’. BAN can serve as a “placeholder” that mirrors a grammatical marking of a temporarily postponed nominal or verbal constituent, thus showing that a speaker may narrow a paradigmatic class of the upcoming lexeme before the search for the particular word is completed.

Popova T.I.

REPETITION AS A WAY TO COORDINATE SPEECH BEHAVIOR IN OFFICIAL PUBLIC DIALOGUE

The paper deals with the analysis of lexico-syntactic repetition as a way of coordinating speech behavior. The kind of repetition (direct vs. indirect) appears to correspond to the speaker’s social position with respect to the addressee. The choice of the type of coordination (modal vs. cognitive) is found to depend on the position of a speech act in the structure of the discourse.

Rakhilina E.V., Kobritsov B.P., Kustova G.I., Lashevskaia O.N., Shemanaeva O.Yu.

SEMANTIC AMBIGUITY AS AN APPLICATION-ORIENTED PROBLEM: WORD CLASS TAGGING IN THE RNC

The lexico-semantic annotation in RNC is considered in the light of other semantically-labeled corpora, such as WordNet-oriented corpora or FrameNet. In order to reduce “noise” in semantic search we propose some agreements that concern the traditional concepts of lexical semantics and lexicography: polysemy, homonymy, and the hierarchy of word meanings.

Raskin Viktor, Purdue University

The Whys and Hows of Ontological Semantics

The NLP Lab at Purdue University (NLPL) has co-founded and tested, in a number of applications, a knowledge- and meaning-based approach to NLP called ontological semantics (OS). Since 1999, NLPL cooperated with CERIAS in applying the approach to information assurance and security (IAS) tasks. This paper tries to handle the question why most in NLP today—and the entire Semantic Web enterprise—are still pursuing non-semantic methodologies, even in response to RFP with explicitly semantic and even ontological-semantic objectives. The paper offers some sociological, educational, and academic explanations for the «fear of semantics.»

Rozina R.I.

SYNCHRONIC AND DIACHRONIC APPROACH TO DERIVATIONAL RELATIONS (WITH RESPECT TO MODERN RUSSIAN SLANG)

The paper looks at instances of divergence between a synchronic pattern of semantic extension resulting in slang, and the real history of slang meaning. The author arrives at the conclusion that multiple motivation of slang should be reflected in lexicography.

Rubashkin V.Sh., Chuprin B.Yu.

QUANTITATIVE DATA RECOGNITION IN NLP

Quantitative data recognition is discussed. We describe information extraction technology, which is under development now. The following topics are discussed: what is the quantitative data in a text document? methods of numerical data presentation; the tasks that the analyzing algorithm is expected to accomplish; the dictionary support; software implementation and results.

Salomatina N.V. , Gusev V.D.

AUTOMATION OF CUE DICTIONARIES FORMATION AND THEIR APPLICATIONS

The idea of cue dictionary method for extracting information on various aspects of scientific text contents (purpose, novelty and etc.) had already been formulated in 1970s. The bottleneck of this technique is that the compilation of dictionaries is a very cumbersome procedure. An automation technique is proposed for this process which substantially reduces the use of manual labor.

Grigori Sidorov, Noe Alejandro Castro-Sanchez

SYSTEM FOR LINGUISTICALLY-BASED EVALUATION OF PSYCOLOGICAL PROFILES

We present a system designed for use by a psychologist in the analysis of a specific type of texts – texts of emotional autoreflexive writing. On the basis of linguistic analysis, the psychologist can make conclusions about the emotional state of a person or about the type of his personality. The system is designed to assist the psychologist. The system has the following features: automatic morphological analysis, calculation of various statistical parameters (frequencies, lexical richness, etc.). The data on words with emotional connotations are given separately because these words represent the person’s current condition. We implemented the mechanism for synchronization of measuring body temperature during text writing and the resulting text. Also, we describe the application of the system in another field – the analysis of political discourse in Mexico.

Sidorova E.A., Zagorul’ko Yu.A., Kononenko I.S.

SEMANTIC APPROACH TO DOCUMENT ANALYSIS BASED ON ONTOLOGY OF SUBJECT DOMAIN

The paper discusses approach to text analysis based on ontology of subject domain. The main components of the ontology, in particular, schemes of facts are described. Authors consider construction of the facts as a primary goal of the semantic analysis. Fact joins the dictionary lexical objects founded in the text and/or objects corresponding to ontology concepts already allocated in the text. Semantic and syntactic compatibility of elements are used for the construction of facts.

Sokolova E.G., Boldasov M.V.

PRINCIPLES OF ANNOTATING THE IMAGE CONTENT

The paper tackles principles for semantic annotation of the image content. XML form annotations represent objects and their static composition in the image. They were manually written for some outside photos on the ground of a little fragment of Ontology developed by the authors. The ontology describes conceptual knowledge about objects within an image. Annotation schemes and the ontology proposed in this paper can be used for data mining in the image collections or for natural language generation of the image content descriptions.

Starostin A.S., Malkovsky M.G.

SYNTACTIC MODELING IN THE «TREETON» MORPHOSYNTACTIC ANALYSIS SYSTEM

The article introduces a formal model of syntax description. This model is a combination of two different approaches to syntax description: phrase structure grammar and dependency grammar (in the spirit of A.V. Gladky). The «Treeton» morpho-syntactic analysis system, working within the mentioned formal model, is described. The paper also deals with the syntactic analysis algorythm implemented in the system. To lower the number of hypotheses produced during the analysis the algorythm uses a mechanism of penalizing the syntax structures for undesirable elements. This mechanism is also described

Sharonov I.A., Russian State University for the Humanities

ONE MORE CLASSIFICATION OF EMOTIONAL INTERJECTIONS

A new interpretation of emotional interjections is proposed. The interjections are regarded as transliterations of sound of vocal gestures. A broad analysis of contexts reveals the set of symptomatic situations, the basic list of vocal gestures and the lists of interjections that convey every vocal gesture in texts. This approach enables the creation of a basis for linguistic and lexicographic descriptions of emotive interjection in Russian and other languages.

Shemanaeva O.Yu.

THE EXACT AND ROUGH ESTIMATION OF OBJECT SIZES IN RUSSIAN

This paper deals with idiomatic Russian phrases such as шириной в ладонь (the size of a palm), высотой с человеческий рост (as tall as a man), размером с дом (the size of a house) and others, that estimate the sizes of objects. More precise estimation would be with preposition В ‘in’, whereas rough estimation and comparison should be with preposition С ‘like’. It describes the inner structure and usage of these two constructions and sets them apart from some similar expressions with those prepositions.

Shmeleva E., Shmelev A.

INTERTEXTUAL FRAGMENTS IN CONTEMPORARY RUSSIAN JOKES

The paper discusses various types of intertextuality in Russian jokes: direct quotations (among them modified quotations), “spot reference”, reference to complex plot units, reference to non-verbal semiotic objects. The most common sources of intertextuality are outlined.

Sekerina I.

STUDYING BILINGUALISM USING EYE-TRACKING

Ter-Avanesova A.V., Krylov S.A.

LEXICO-GRAMMATICAL DATABASES AS A TOOL OF DIALECTOLOGICAL DESCRIPTION

In the environment STARLING a lexico-grammatical database (30 000 wordforms) of the dialect of Pustosha village (Moscow region, Shatura district) was created. The nuclear dialectal corpus (NDC) with the entire lexico-grammatical notation (lemmatization) is a base for secondary databases (indexes).

Tokareva M.Yu., Bolshakova E.I., Bordachenkova E.A.

AUTOMATIC GENERATION OF SPORTS COMMENTARY

A method of automatic text generation for real-time commentary on the dynamic sports competitions is described. The key features are flexible selection of the event to be commented upon and synthesis of the commenting string based on the appropriate phrase templates. An automatic commenting system developed for “Formula-1” races is overviewed.

Tolpegin P.V., Vetrov D.P., Kropotov D.A.

AUTOMATED THIRD PERSON ANAPHORA RESOLUTION ALGORITHM ON THE BASIS OF MACHINE LEARNING METHODS

An approach to the automated third person anaphora resolution is considered. Reference rules were obtained with the aid of machine learning methods. More than 60% accuracy level was achieved.

Liliya Tsirulnik, Boris Lobanov

EXPERIMENTAL EVALUATION OF COMPILATION ELEMENTS’ CONTRIBUTION TO THE PLAUSIBILITY OF THE SYNTHESIZED SPEECH CLONE

The study has been carried out within the framework of research on personal voice cloning. The paper deals with the results of the experiment aimed at the evaluation of the effect of compilation elements of different phonetic types (stressed/unstressed vowels, consonants) and of different levels (allophones and multi-phones) on the perception of personal phonetic-acoustical characteristics of the voice in the Text-to-Speech Synthesis. Universal methods of subjective evaluation of synthesized speech quality (so called MOS evaluation) are used in the experiment. The paper reviews the prospects of how various levels of compilation elements applied can be used in synthesized speech systems.

Tsukanova V.L.

APPLYING METHODS OF THE DISCOUSE-ORIENTED TRANSCRIPTION TO THE NON-INDO-EUROPEAN LANGUAGE

The paper discusses the application of the discourse-oriented transcription developed for the corpora of Russian texts to the texts in Kuwaiti Arabic. The paper focuses mainly on the cases of non-standard division of the text into discourse units as well as on the grammatical features which cause such division.

Tuzov V.A.

SEMANTICS OF PREPOSITIONAL-CASE FORMS OF THE RUSSIAN LANGUAGE

The problem of computing the meaning of the prepositional-case forms within the formal lexicographic definitions of Russian words as prescribed by the semantic dictionary is discussed. The addition of a database containing the information of the subject domain to the dictionary allows to compute automatically the meanings of all prepositional-case forms of the Russian language. As a rule, the problem is reduced to the choice of an attribute for the object connecting the prepositional-case form. A possible structure of such database is considered.

Tuzovsky A.F., Kozlov S.V.

CONSTRUCTION OF THE ORGANISATION KNOWLEDGE MODEL USING A SYSTEM OF ONTOLOGIES

It is proposed to describe the organization knowledge model in the form of a system of ontologies supplementing each other. The model consists of a basic ontology of the enterprise and a set of knowledge domain ontologies. An approach to the construction of knowledge model is described and the structure of a knowledge management system on its basis is proposed.

Uryson E.V.

RUSSIAN CONJUNCTIONS I ‘AND’, A ‘AND, BUT’, NO ‘BUT’: BASIC SEMANTIC DISTINCTIONS

The main feature that determines the semantics of Russian conjunctions I ‘and’, A ‘and, but’, NO ‘but’ is contrariety/agreement-to-expectation. A hypothesis explaining the character of this distinction is proposed. A semantic invariant is proposed for every conjunction under consideration . The nature of this invariant as well as semantic metalanguage in general is discussed.

Vinogradova N.V.

CONTACT-MAKING FUNCTION OF RUSSIAN COMPUTER JARGON

The contact-making function of Russian computer jargon in comparison to literary language has certain peculiarities and appears in different variants. By considering the contact-making function of Russian computer jargon as part of a global computer sublanguage and a source of language convergence we can point to so-called “computerese generalities” in which the contact-making function is realized.

Voskresenskij A.L., Khakhalin G.K.

SEMANTIC SEARCH METHODS

Local contexts are shown to be insufficient for disambiguation when translating from verbal language to sign language. Methods of concept comparison based on syntactic and semantic analysis are discussed. A method of automated search for documents unknown to the user in the Internet is proposed.

Yagunova E.V.

FUNDAMENTAL FREQUENCY AND ENHANCED WORD RECOGNITION

Fundamental frequency and its role in speech perception are analyzed with reference to professional and fiction texts. Subjects were exposed to the texts under white-noise masking and in the clear where the original words have been changed to their nonsense (artificial) ‘equivalents’. Recognition scores were correlated to the Topic-Comment structure, type of phonetic reduction, etc. One of the most important findings is a change in perceptual strategy depending on the text type (professional or fiction in our case). Fundamental frequency clues seem to be actively used to enhance word recognition, counterbalancing, to some extent, the poor quality of segmentals.

Yanko T

TEXT INTONATION

Russian intonation of text incompleteness has been analyzed. Text incompleteness is taken in compositions with contrast, emotional emphasis, and verification. The fundamental frequency fo contours and the accent placement proved to be the means of expression of text incompleteness and its compositions with contrast and other meanings. The text functions of a variety of intonation strategies have been described.

Yanovich I.S.

TWO KAKOJ-S IN RUSSIAN

The paper investigates the categorial status of Russian kakoj-based pronouns: are they adjectives or determiners? It is argued that these pronouns exist in two variants differing in meaning. The proposed solution allows capturing observed semantic and syntactic facts.

Yanovich I.S., Fedorova O.V.

SUBJECT-VERB AGREEMENT ERRORS IN RUSSIAN: HEAD NOUN GENDER EFFECT

We present new data showing that grammatical gender affects subject-verb agreement in Russian. The hypothesis that this effect is due to the level of markedness of different gender features in Russian is shown to be borne out.

Yudina M.V. Moscow State University

COMPREHENSION AND PRODUCTION OF SYNTACTICALLY AMBIGUOUS SENTENCES (USING THE MATERIAL OF RELATIVE CLAUSE ATTACHMENT IN RUSSIAN)

The paper is devoted to the strategies of syntactic ambiguity resolution (based on high-low attachment investigation) from the point of view of production and comprehension. The purpose of our research was to test whether the high-attachment preference, which was proved in previous comprehension investigations on Russian material, will remain when producing such type of sentences.

Zagorul’ko Yu., Borovkova O.I., Kononenko I.S., Sidorova E.A.

SEMANTIC APPROACH TO DOCUMENT ANALYSIS BASED ON SUBJECT DOMAIN ONTOLOGY

The paper discusses approach to text analysis based on ontology of the subject domain. The main components of the ontology, in particular, schemes of facts are described. The authors consider construction of facts as a primary goal of the semantic analysis. Fact joins the dictionary lexical objects founded in the text and/or objects corresponding to ontology concepts already allocated in the text. Semantic and syntactic compatibility of elements are used for the construction of facts.

Zakharov L.M., Philological faculty of Moscow State Lomonosov University Kazakevich O.A., Computer Research Centre of Moscow State Lomonosov University

ON SENTENCE BOUNDARIES IN ORAL TEXTS IN LANGUAGES WITHOUT STABLE WRITTEN TRADITION

The paper considers the problem of fixing sentence boundaries in speech in languages without stable written tradition. In modern written texts the borders between sentences are distinctly marked so there is no problem to tell where one sentence comes to an end and another begins. A quite different situation arises as soon as we are to fix sentence borders in an oral text, especially in a language without stable written tradition. Analyzing the material of two practically unwritten languages of Siberia (Selkup and Ket) we examine the possibility of using some prosodic features as sentence boundary markers in speech.

Anna A. Zalizniak

RUSSIAN CULTURAL CONCEPTS IN THE EUROPEAN LINGUISTIC PERSPECTIVE: THE WORD PROBLEMA

The paper deals with the history and the actual status of the word problema ‘problem’ in Russian. In contemporary Russian it has acquired a meaning, roughly, ‘something that creates an obstacle for the normal course of events’ (U X-a problemy s Y-om), which appears as a semantic calque from English. It is closely linked to one of a key ideas of the Western culture and a series of key words expressing it (such as happy, OK, enjoy).

Anna A. Zalizniak, Irina Mikaelyan

E-MAIL CORRESPONDENCE AS AN OBJECT OF LINGUISTIC ANALYSIS

E-mail correspondence is considered as a communicative genre characterized by a number of specific features that distinguishes it from other cognate speech genres. The analysis of e-mail correspondence in Russian reveals some important linguistic and psycholinguistic regularities of the spontaneous written speech production. It is argued that Russian e-mail correspondence in Latin transliteration constitutes an important and stable variant of Internet correspondence in Russian: this variant possesses its own specific features and may be responsible for the loosening of the Russian language norms.

Zaretskaya E.N.

LOGICAL PSYCHOLOGICAL STRUCTURE OF DISCUSSION

Three types of arguments: apodictic, eristic, sophistic, are considered taking into account the motivation and speech behavior of opponents. The structure of public text is viewed as a set comprising the seven (eight) elements: address, thesis, narration, description, proof, disproof, appeal (conclusion). The categories of persuasiveness and argumentativeness are grounded both logically and emotionally. The description is given of verbal confrontation devices.

Zatsman I.M.

POLYDOMAIN MODELS FOR EVALUATION SYSTEMS OF INNOVATIVE POTENTIAL AND PERFORMANCE OF RESEARCHES

Models of intellectual systems intended for monitoring and evaluation of innovative potential and performance of researches are considered. Considered models are a combination of lexico-semantic, information, algorithmic, mathematical and of some other components.

Zevakhina N.A.

GERMAN COMPOUND ADJECTIVES IN A DICTIONARY AND IN THE DISCOURSE

Our work is focused on the properties of German compound adjectives conveying the idea of comparison, the source of the empirical data being a large corpus of newspaper discourse. The number of such compounds occurring in the corpus amounts 412 and only one third of them can be found in the Big German-Russian Dictionary. This proportion needs explanation, and we try to determine the relevant formal, semantic and stylistic-pragmatic factors. Finally, prognostic conclusions are drawn concerning lexical-graphic applications.

Zimmerling A.V.

FREE WORD ORDER AND MODELLING OF INVERSION

Language L is defined as having free word order if the relative order of any two sentence categories X, Y can be inverted: [X + Y] ? [Y + X]. This definition does not exclude languages with constraints on the placement of elements attached to sentence 1st , 2nd or 3d positions from the left boundary. At the same time, many languages with one statistically prevailing order, as SOV, SVO, VSO etc, lack constraints that block for less frequent orders. Presumably, all natural languages have pairs (or sets with n elements, n? 2) of sentences with one and the same structure, but different linear orders. We proceed from the assumption that for each pair/set of such sentences it is possible to establish the variant representing the basic order and get the derived orders from it. It is possible to get the derived order from the basic one by singling out the element that moves: {a + b + c} ? {b + a + tb + c}. The analysis in terms of Movement is preferable to the traditional description where e.g. Subject-Verb order is chosen as ‘basic’ and Verb-Subject order as ‘inverted’ and no attempt is made to prove that either of the elements in the group can move. Movement of elements can be formalized in a different way. The generative account (Fiengo, Chomsky) is counterfactual, since it does not explain the contexts with left-to-right Movement patterns: Movement patterns of this type are especially productive in languages with the so called Wackernagel’s law.