• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

‘Bots Are Simply Imitators, not Artists’: How to Distinguish Artificial Intellect from a Real Author

‘Bots Are Simply Imitators, not Artists’: How to Distinguish Artificial Intellect from a Real Author

© iStock

Today, text bots like ChatGPT are doing many tasks that were originally human work. In our place, they can rewrite ‘War and Peace’ in a Shakespearean style, write a thesis on Ancient Mesopotamia, or create a Valentine’s Day card. But is there any way to identify an AI-generated text and distinguish it from works done by a human being? Can we catch out a robot? The Deputy Head of the HSE School of Data Analysis and Artificial Intelligence, Professor of the HSE Faculty of Computer Science Vasilii Gromov explained the answer in his lecture ‘Catch out a Bot, or the Large-Scale Structure of Natural Intelligence’ for Znanie intellectual society.

‘Why are modern texts created and who writes them?’ asked Vasilii Gromov. His generation and the generation of lecture listeners grew up on works written by people for people: authors of such texts put a certain meaning into their works, had a certain goal, whether the book was ‘Sleeping Beauty,’ ‘War and Peace,’ or a textbook of mathematical analysis, the professor notes. However, nowadays, children from a very early age are surrounded by texts written by an unknown author with an unclear purpose for an undefined audience. Vasilii Gromov and his colleagues wondered whether such a child would grow up the same way the previous generations have done.

The ongoing change is neither good nor bad, because the world is transforming. Humankind is now experiencing the process of ‘co-evolution of artificial intelligence and humans.’ Along with its rapid development, AI is adapting to humans, but humans also are beginning to adapt to artificial intelligence as well. To secure our future, or at least for ‘basic information hygiene,’ we need to learn to distinguish texts generated by bots (artificial intelligence systems that generate texts in natural languages like Russian, Chinese, etc) from those written by people.

Using a number of existing generated texts, it would not be difficult to identify whether a new text was written by a specific bot or a human: we simply need to load a large number of similarly generated texts into the neural network—and there you go, mission accomplished. However, after this, no-one would continue using that particular bot, and it would simply be replaced by another artificial intelligence. Therefore, scientists need to develop a mechanism capable of distinguishing any bot from any human. To do this, we need to look at the structure of language itself, which brings us to research, explaining natural languages from a mathematical point of view. Now, let’s take a look at the necessary steps.

The scientific field of natural language processing works, in particular, with the representation of words and sequences of words (n-grams, where n is the number of words) in the form of vectors (several elements of a certain number in a row), which creates a certain vector space.

Working with the representation of individual words reveals that the vocabulary of bots is no different from the vocabulary of an ordinary person. However, as soon as it comes to a sequence of two or three words, it turns out that the sequence generated by bots is significantly more predictable and much poorer in linguistic terms than the one that even the most poorly educated person can create (for example, a bot is more likely to repeat patterns). The difference between the n-gram sequence of bots and people is statistically significant even for large bots (ChatGPT), and this is what helps catch them.

Further study of natural language from a mathematical point of view brings scholars to some judgments on the location of such word vectors in space. There are regions of vector space (especially when it comes to the sequences of words) that only bots visit, and others that only people visit. Most (90–95%) are used by both, but there are separate bot areas—which is another way to catch them out.

If we cluster (a mathematical operation when sets of similar elements can be combined into one group—a cluster) a sequence of bots, these sequences turn out to be more rigid, compact, and without any discrepancies. When a verbal sequence of people of different genders and ages, with different education and backgrounds is clustered, the result is more blurry, indistinct clusters. Humans think significantly less clearly than bots, and this is another way to catch them.

If we represent each word or each n-gram as a vector, then their entire collection can be represented as a geometric object or a certain surface in a multidimensional space. Then, for example, if we take all possible word sequences in Russian, we may find that they do not fill the entire semantic space, but only part of it. Scientists can study and measure this sequence as a surface, even compare it with other surfaces (for example, with the surface of the English language). So, every surface in space has a dimension, ie, the number of independent parameters necessary to describe this object (for points on a sphere, for example, these are two values—longitude and latitude).

Studying the dimension of natural language, Vasilii Gromov expected to find an infinite value, but in the end, analysts came to the conclusion that language has a 9–10-digit dimension, and this figure varies slightly from language to language, but what is certain: human language lies in larger space dimensions than the bot's language.

Finally, the results of a recent 2023 study showed that this surface has ‘holes’ in it, like Swiss cheese. The holes are those areas of semantic space that our language has not yet reached. Although at the moment analysts cannot clearly indicate what is hidden behind them, they can detect them. Different languages have different holes, also referred to as ‘blind spots.’ When catching bots, it is important to remember that people are drawn to the boundaries of such holes, because they use language to create new meanings and ideas. Meanwhile, bots, like learned programs, move away from these holes, which makes the task of catching them easier for now. Surprisingly, it is humour that most often appears at the boundaries of such holes.

‘Bots are simply imitators, not artists. Technology does not stand still, so we must try to solve this “bot-catching” problem and understand what a language is from a mathematical point of view,’ summarised Vasilii Gromov.

See also:

HSE’S Achievements in AI Presented at AIJ

The AI Journey international conference hosted a session led by Deputy Prime Minister Dmitry Chernyshenko highlighting the achievements of Russian research centres in artificial intelligence. Alexey Masyutin, Head of the HSE AI Research Centre, showcased the centre’s key developments. 

Drivers of Progress and Sources of Revenue: The Role of Universities in Technology Transfer

In the modern world, the effective transfer of socio-economic and humanities-based knowledge to the real economy and public administration is essential. Universities play a decisive role in this process. They have the capability to unite diverse teams and, in partnership with the state and businesses, develop and enhance advanced technologies.

AI on Guard of Ecology: Students from Russia and Belarus Propose New Solutions to Environmental Problems

An international online hackathon dedicated to solving environmental problems was held at HSE University in Nizhny Novgorod. Students employed artificial intelligence and computer vision to develop innovative solutions for image segmentation, predictive modelling (forecasting future events based on data from the past) of emissions and creating chatbots for nature reserves and national parks.

Taming the Element: How AI Is Integrating into the Educational Process Around the World

Artificial intelligence is gradually becoming an indispensable part of higher education. Both students and teachers use it to reduce the volume of routine tasks and expand their capabilities. The limitations and prospects of AI are discussed in the report ‘The Beginning of the End or a New Era? The Effects of Generative Artificial Intelligence (GAI) in Higher Education,’ published in the journal Modern Education Analytics, under the scientific supervision of HSE Academic Supervisor Yaroslav Kuzminov.

A New Tool Designed to Assess AI Ethics in Medicine Developed at HSE University

A team of researchers at the HSE AI Research Centre has created an index to evaluate the ethical standards of artificial intelligence (AI) systems used in medicine. This tool is designed to minimise potential risks and promote safer development and implementation of AI technologies in medical practice.

HSE Researchers Develop Novel Approach to Evaluating AI Applications in Education

Researchers at HSE University have proposed a novel approach to assessing AI's competency in educational settings. The approach is grounded in psychometric principles and has been empirically tested using the GPT-4 model. This marks the first step in evaluating the true readiness of generative models to serve as assistants for teachers or students. The results have been published in arXiv.

‘Philosophy Is Thinking Outside the Box’

In October 2024, Louis Vervoort, Associate Professor at the School of Philosophy and Cultural Studies of the Faculty of Humanities presented his report ‘Gettier's Problem and Quine's Epistemic Holism: A Unified Account’ at the Formal Philosophy seminar, which covered one of the basic problems of contemporary epistemology. What are the limitations of physics as a science? What are the dangers of AI? How to survive the Russian cold? Louis Vervoort discussed these and many other questions in his interview with the HSE News Service.

HSE Scientists Propose AI-Driven Solutions for Medical Applications

Artificial intelligence will not replace medical professionals but can serve as an excellent assistant to them. Healthcare requires advanced technologies capable of rapidly analysing and monitoring patients' conditions. HSE scientists have integrated AI in preoperative planning and postoperative outcome evaluation for spinal surgery and developed an automated intelligent system to assess the biomechanics of the arms and legs.

HSE University and Sber Researchers to Make AI More Empathetic

Researchers at the HSE AI Research Centre and Sber AI Lab have developed a special system that, using large language models, will make artificial intelligence (AI) more emotional when communicating with a person. Multi-agent models, which are gaining popularity, will be engaged in the synthesis of AI emotions. The article on this conducted research was published as part of the International Joint Conference on Artificial Intelligence (IJCAI) 2024.

Neural Network for Assessing English Language Proficiency Developed at HSE University

The AI Lingua Neural Network has been collaboratively developed by the HSE University’s AI Research Centre, School of Foreign Languages, and online campus. The model has been trained on thousands of expert assessments of both oral and written texts. The system evaluates an individual's ability to communicate in English verbally and in writing.