Prospects for the Integration of Google Trends Data and Official Statistics to Assess Social Comfort and Predict the Financial Situation of the Population

This paper aims to develop a theory of statistical observation in terms of scientific and methodological approaches to processing big data and to determine the possibilities of integrating information resources of various types to measure complex latent categories (using the example of social comfort) and to apply this experience in practice through the use of the financial situation indicators in forecasting. The authors have built a social comfort model in which the choice of weights for its components is based on a modified principal component analysis . The assessment is based on Google Trends data and official statistics. Google Trends data analysis methods are based on the development of an integrated approach to the semantic search for information about the components of social comfort, which reduces the share of author’s subjectivity; methodology of primary processing, considering the principles of comparability, homogeneity, consistency, relevance, description of functions and models necessary for the selection and adjustment of search queries. The proposed algorithm for working with big data allowed to determine the components of social comfort (“Education and Training”, “Safety”, “Leisure and free time”), for which it is necessary to directly integrate big data in the system of primary statistical accounting with further data processing and obtaining composite indicators. The authors conclude that a stable significant correlation has been found for the “Financial Situation” component, which makes it possible to use it for further calculations and extrapolation of financial indicators. The scientific novelty lies in the development of principles and directions for the integration of two alternative data sources when assessing complex latent categories. The findings and the results of the integral assessment of social comfort can be used by state statistics authorities to form a new type of continuous statistical observation based on the use of big data, as well as by executive authorities at the federal, regional and municipal levels in terms of determining the priorities of socio-economic policy development.


INTRODUCTION
Over the past two decades, the popularization of Internet use has increased significantly, which has contributed to an increase in the amount of stored information about user activities. Examples of big data are social media data, telephone records, websites, search engine data [1]. New trends have attracted academic interest in the use of big data in research.
Big data is gaining popularity for measuring human well-being as well as predicting financial performance. Based on the analysis of foreign and domestic periodical publication, the following main sources of big data can be identified: However, bringing big data in line with the requirements of national and international recommendations will reduce its advantages in terms of efficiency of use, timeliness and relevance, which currently provides them with high economic efficiency.
A prerequisite for the use of big data is wide access of the population to the Internet. Despite the rapid development of the Internet in the last decade, the possibilities of big data in developed countries are higher than in developing ones [16]. So, according to official statistics, in 2019 in Russia the share of the population using the Internet on a daily basis was 73%, in Moscow -82%. This fact can lead to biased estimates of the studied variables since the reliability of the results is guaranteed not only by a large number of obser vations but primarily by the representativeness of the sample population.
Serious methodological work and high risks of using big data form obstacles to their successful integration into official statistics. In Russia, there are some pilot projects on the use of Internet resources to improve consumer price statistics, data from mobile operators for tourism statistics, satellite communications monitoring for the development of environmental statistics [17]. However, we note the relatively narrow scope of their application and the lack of practical experience (no official publications are presented).
At the same time, there are examples of successful implementation of big data in official statistics in some developed countries. In 2015, Statistics Netherlands expanded transport statistics with the publication of indicators that were calculated on the basis of information received from sensors on the country's highways. 1 Real-time data made it possible to make timely decisions when ice formed in the northern part of the country. An example of such integration is the experience of Canada: forecasting the yield of agricultural crops is made not only on the basis of the results of surveys of farmers but also information on the state of land and climate obtained via satellite communications [18]. Also, active work is underway to attract big data as an alternative source of information on consumer prices in Denmark, the Netherlands, Italy, Norway, Australia, Switzerland, Belgium, New Zealand, Sweden.
Since the experience of introducing big data into the practice of state statistics bodies already exists, it is necessary to continue research in the field of the principles of integrating two information sources, and not be limited only to general projects to study the potential of big data.
In this regard, the aim of the study is to develop the theory of statistical observation in terms of scientific and methodological approaches to processing big data and to determine the possibilities of integrating information resources of various types in relation to measuring complex latent categories (using the example of social comfort).
The introduction of a new economic category "social comfort" is necessary to "determine the dynamics of the real level of well-being of the population, assess the true quality of life of people" [19]. Despite the novelty of the study of this process in Russian practice, foreign studies provide an analysis of comfortable conditions for an individual in the following areas: geography, sociology, medicine, psychology, economics, and finance. In [19], the axiomatics and composition of the introduced category are considered in detail. Incoming categories of social comfort: health and medical services, education and learning, social support and pensions, financial situation, employment, housing and living conditions, ethical norms and values, safety, political stability, rest and leisure, ecology and the environment, infrastructure.
In this work, based on the use of the resources of the Federal State Statistics Service (Rosstat) and big data, it is proposed to assess social comfort, to determine the degree of consistency of its components built on two data sources, and to identify possible directions of integration.

EMPIRICAL APPROACHES TO SEMANTIC SEARCH OF INFORMATION ON FINANCIAL AND ECONOMIC
INDICATORS BASED ON WEB REQUESTS M o s t o f t h e s t u d i e s co n d u c t e d h a ve demonstrated the promise of using big data. However, the problem of choosing keywords to determine user queries is still relevant. In many works, the formation of keywords is based on a small number of user queries based on the intuitive assumptions of the authors of the study about the importance of a particular query for a person. As noted earlier, the first work on introducing big data into statistical practice and calculations focused on financial aspects.
Thus, in [20], inflation forecasting is carried out on the basis of web requests. The author examines 75 search queries that are related to financial markets, as well as the interests of the population, economic and financial phenomena, and processes. These queries were selected among the most popular ones based on the analysis of correlation dependences with inflation.
The author of the study [21] analyzes inflation expectations using the keyword "inflation", on the basis of which all kinds of search queries are built. Further, the author makes a comparative assessment with the results of inflationary expectations of the population based on the data of sociological surveys.
In [2], for each index of subjective wellbeing, which is represented by an index of positive and negative emotions, a set of indirect variables from Google search queries (such as happiness, respect, stress, anxiety, etc.) is used reasonably. The choice is primarily determined by the direction of the evoked emotions: positive or negative.
The author of the study [22] uses the Google Trends service as a proxy to predict the volatility of energy prices. The authors begin their research with a set of 90 terms used in the energy sector. The most popular words on this topic from Google are added to terms gleaned from professional literature. Filtration of such a set of words occurs by building various models that best predict the volatility of prices for crude oil, fuel oil, gasoline, natural gas.
The author of the study [13] compiles a list of search queries that reveal links to economic conditions. To determine the search words as objectively as possible, the author starts working with a vocabulary of finance and text analytics [23] and selects words related to "economic" words that positively or negatively affect a person's mood. Studies [13,23,24] use Harvard IV-4 Dictionaries, which have several editions, as the rationale for the choice of words to create an aggregated index of investor sentiment. This dictionary was developed by Dexter Dunphy and colleagues [25][26][27].
The result is a list of 149 words such as inflation, recession, security, etc. Then each of the 149 words is entered into Google Trends, which selects the top ten (most popular) search queries for each word. For example, the word "deficit" taken from the dictionary leads to the following search queries: "budget deficit", "attention deficit", "trade deficit", etc. As a result, 149 words increased to 1490. The last stage excludes those words/phrases provided by Google that are not related to economic conditions or finance and have zero search volume.
After the procedure of statistical processing of the received time series of search queries, the financial and economic index is constructed as a new indicator for determining and predicting investor sentiment. The high predictive power of the index is noted and possible prospects for practical application are indicated.
It is worth noting several works that, at the initial stage, take an arbitrary set of search terms, and then select the most informative ones using the Bayesian model averaging [3,28].
In the study [29], the authors use three different types of information to identify the determinants of the saving behavior of the population in the EU countries: macroeconomic statistics (nominal effective exchange rate, nominal GDP, inflation indicators, etc.), Google search words (42 words), which reflect the mood of economic agents, behavioral, psychological factors influencing preferences, as well as the data of opinion polls reflecting expectations regarding the current and future financial and economic situation. The selection of all variables for analysis is based on the economic intuition of the authors. The Bayesian model averaging is used as a model tool. The authors attribute its use to the lack of Google's keyword selection strategy.

APPROACHES TO LARGE DATA
PROCESSING Big data is generated by the users themselves. Unlike official statistics, they are not collected according to a specially developed and approved methodology. In this regard, for their adequate use and integration into official statistics, it is necessary to develop a special methodology for collection and processing. We will analyze the existing experience in processing big data in foreign studies.
In a study [30], the authors analyze the frequency of search queries related to tourism in Germany based on Google Trends and suggest ways to cleanse the data to eliminate false predictions.
According to the source [31], Google Trends is formed as follows: the ratio of the number of Internet requests for a specific keyword at time t in region r to the total number of requests at time t in region r is determined, and the found ratio is multiplied by 100 to standardize.
The authors propose the following modification of the initial data: find the ratio of the number of web requests for a specific keyword at time t in region r to the average value of the volume of web requests for keywords at time t in region r. The resulting modified data are called averages divided by the analyzed categories. However, this transformation raises a number of problems: • the time series has a stronger seasonality as a result of the peculiar seasonal variations of each time series separately; • the time series has a larger number of outliers due to the increasing role of individual time series in the denominator.
To overcome the problems mentioned above, the authors propose to find the ratio of the number of web requests for a specific keyword at time t in region r to the average trend of web requests for keywords at time t in region r. The trend is determined according to the decomposition of the time series [32]. The modified time series is called division by the average trend.
Research [24] strives to confirm the potential for public sentiment-based searches to influence the Portuguese stock market. As a way of processing queries, the author proposes to logarithm the search query for a specific word for a week, and then find the first differences between the volumes of queries for a specific word over two periods of time. To ensure comparability and consistency of the data, the author proposes to amend for seasonality and heteroscedasticity. To remove outliers, the search sample is censored, that is, 5% of the sample with the smallest and largest search volume is cut off. To test the seasonality, the analysis of variance method is used, in which the hypothesis of equality of 12-month averages is tested. Queries with a pronounced seasonality were cleared by building regression models with 12 dummy variables (for each month), in which the residuals were found and used in subsequent iterations of the analysis. To eliminate heteroscedasticity, a standardization procedure is used using the standard deviation.
In [33], it is proposed to combine the search queries for the standard deviation and the deseasonalization procedure for clearing seasonal fluctuations, performed through the seasonal package in the R language.
The author of the study [34] proposes to use the logarithm with the further taking of the first differences to bring the time series of search queries to a stationary form.
These procedures are followed to clear Google Trends search results [2]: 1. Standardization using Z-score, as a result of which the distribution of data for different search queries is reduced to one scale, which allows comparisons.
2. Elimination of sharp jumps in popularity with the help of a moving average.

Re m ov a l o f s e a r c h q u e r i e s w i t h a continuous zero search volume.
4. Clearing the time series from the trend by building regressions with a time trend and removing search queries with a coefficient of determination higher than 0.6.
5. Clearing the time series from seasonality by constructing regressions with monthly dummy variables and using further residuals of regression models.
This methodological approach to big data processing seems to be the most complex and consistent. The main stages of the described approach will be used in the framework of the study.

SELECTING KEYWORDS AND PROCESSING GOOGLE TRENDS SEARCH REQUIREMENTS
The main problem of most research on Google Trends is that the selection of keywords for the semantic disclosure of a particular socioeconomic process or phenomenon is selected intuitively, based on the author's experience. Further, a number of econometric methods are used (Bayesian model averaging, factor analysis, correlation analysis, etc.) to select the most informative search queries that reflect the analyzed index or another indicator. A significant difference of the study is a wellgrounded approach to semantic information retrieval (based on Google Trends search queries) about the components of social comfort, which consists in the use of the Harvard IV-4 Dictionaries. The peculiarity of this dictionary is that it helps to solve the problem of ambiguity in assigning words to certain categories. For example, the dictionary contains groupings of words according to the following categories: words of positive worldview, negative worldview, words of joy, pain, virtue, vice; words characterizing social categories (education, finance, labor, etc.); motivational words, etc. 2 In this regard, in this study, each of the twelve components of social comfort is filled with keywords from the Harvard dictionary, then an analysis of the degree of compliance with the realities of Russian reality is carried out (such words as canoe, cowboy, Thanksgiving Day, Independence Day, Constitutional Convention, Bill of Rights, Jury -have been removed).
In order to bring the set of keywordsqueries of users closer to the real conditions of everyday Russian life, the following words were added: homeowners association, minimum wage, unified national exam, amendments to the Constitution, single voting day, housing and public utilities, compulsory medical insurance, voluntary medical insurance. Of course, the share of the author's subjectivity is not excluded, but it is less than 26.5% of the total number of words. In addition, it should be noted that the set of keywords did not include verb queries: get sick, seek, play, pray, etc., as well as those that have several lexical meanings: [ Table 1, which presents only some part of the generated set of keywords.
B i g d a t a p r o c e s s i n g s t a r t s w i t h comparability.
At the first stage of big data analysis, as the experience of the conducted research shows, scaling is necessary to ensure the comparability and consistency of the initial data. One of the most common standardization methods is standard deviation correction [2,33], while other researchers [24,34] use logarithms. In this article, we will rely on the variant of normalization proposed by S. A. Ayvazyan [35] when constructing complex synthetic latent categories of the population's quality of life: where , , , (2) The first option (1) of normalization is used in the case of a positive perception of the search query by an individual, the second option (2) of normalization is used in the case of a negative perception. Information on whether a word belongs to a positive or negative worldview is available in the Harvard IV-4 Dictionaries. Words that are not represented in the Harvard IV-4 Dictionaries were categorized by the authors themselves.
It is worth noting the following regularities between the growth of interest in a particular request and its positive/negative assessment ( Table 2).
At the second stage, the risk of overfitting the model due to sharp jumps is reduced by using a moving average. For example: "Constitution" (sharp jumps caused by voting for amendments to the Constitution in 2020) or "football" (jumps caused by the 2018 World Cup), etc.
The moving average (MA) of order q is defined as follows: In this case, the order q is calculated considering the number of previous values of random deviations 1 , , In this study, the smoothing window will be equal to three months, respectively, the order q of the moving average is equal to three [MA (3)].
At the third stage, the time series is cleared from the trend. The need for this operation is due to the fact that a strong time trend can lead to an inadequate forecast of social comfort, and also when finding correlations in the process of grouping words in a block, the problem of compiling unreliable block indicators of social comfort may arise, since the words will have the same time trend in common and not a semantic load. In this regard, a regression equation is constructed for the dependence of the search query on the trend of the form: where 1 t θtrend; t εa random variable characterizing the deviation of the level from the trend.
Based on the constructed equation, the corrected coefficient of determination is calculated. Search query models with a value of this coefficient above 0.6 will be removed [2].
The fourth stage is the deseasonalization of search queries. The presence of extreme seasonality in queries can lead to strong correlation between themselves, caused by the correspondence of the same seasonal pattern. For deseasonalization, the statsmodels Python module is used. This module includes many classes and functions for evaluating various statistical models, as well as for performing statistical tests and examining statistical data. In particular, this module includes a Seasonal-Trend Decomposition Procedure Based on Loess Method (STL) -a method for decomposing a time series into seasonal and trend components, as well as residuals using a local regression method. The STL method decomposes the time series into the components of the additive model: wheret T the trend component; t Sseasonal component; t Ethe remainder. Elimination of seasonality occurs by subtracting the seasonal component t S from the time series.
As a result of sequential execution of the listed iterations of analysis and data processing, 475 words were selected (out of 574 words before processing) for the period January 2010-January 2021 (frequency -1 month).
The proposed methodology for keyword search and Google Trends processing is implemented in Python code. It allows you to build statistical series in accordance with the basic principles that ensure the quality of statistics -comparability, consistency, accuracy and homogeneity of data.

BUILDING INTEGRATED SOCIAL COMFORT INDICATORS: OFFICIAL STATISTICS AND GOOGLE TRENDS
Large amounts of information require the use of special methods of aggregation and dimensionality reduction. The most popular are factor analysis and principal component analysis. In our study, we will use a modified principal component analysis, presented in more detail in the work of S. A. Ayvazyan [35]. According to this methodology, indicators within each of the 12 blocks 3 of social comfort are combined into block indicators, which are subsequently combined into a consolidated integral indicator of social comfort. Since the study used two types of data, respectively, at the output we got block indicators and 3 The substantiation of the block indicators of social comfort is given in more detail in the study [19]. In order to harmonize and interconnect information resources of various types: GT and official statistics data presented by Rosstat, it is proposed to use normalization for all data, performed according to formulas (1) and (2), and for GT data, apply the smoothing processing procedures described above, trend exceptions, etc.
Further, we will discuss the results of modeling social comfort, obtained using various types of data, and assess the prospects for using GT in relation to complex latent categories (using the example of social comfort).

Simulation results based on Rosstat data
The information base for filling in the blocks of social comfort was the indicators of the socio-economic situation of the regions of Russia for 2010-2019, taken from the Rosstat website. 4  indicators, we were guided by the approach to the analysis of the contextual conditions of the Russian Federation and its regions (described in more detail in the study [19]), as well as the requirements for a set of particular criteria for the synthetic latent category [35]. Much attention was paid to the degree of conformity of the socio-economic content of the indicator to the directly measured hidden category ("Safety", "Housing and medical services", etc.), reliability, accessibility in the official source. In this regard, such blocks of social comfort as "Ethical norms and values", "Political stability" remained empty due to the high share of the subjectivity of the indicators included in them and the lack of information on the official resource of state statistics. The empirical base of the study includes a panel of 100 indicators for 2010-2019. The collected indicators are measured on a quantitative scale in accordance with a unified methodology and basic principles of statistical observation, which ensure the consistency and comparability of the objects of observation.
Since the normalization of the indicators included in the panel was carried out on a 10-point scale, then as a result of calculations using a modified analysis of principal component at the output, the values of the block indicators of social comfort will also belong to the interval from 0 to 10. Let us consider, for example, the results of modeling the social comfort of Moscow for 2010-2019 (Fig. 1).
The consolidated integral indicator for Moscow in 2019 amounted to 6.815, which is the maximum value in 2019 among other regions of Russia and is 13% higher than the level of 2010. Such dynamics are explained by a slight change in the weight coefficients by the growth of block integral indicators.
For the analyzed period 2010-2019 Moscow has improved its position in eight out of ten presented components of social comfort. The most significant changes occurred in the components "Financial Situation" (+1.3 points), "Infrastructure" (+2.1 points), "Rest and Leisure" (+4 points), "Social Support and Pensions" (+ 5.2 points). There are no dynamics in the blocks "Health" and "Safety".
The reasons for the changes are as follows: • "Financial Situation" block: the gross regional product per capita in Moscow is one of the highest in absolute terms and during the analyzed period has increased by about 2 times. In addition, the share of the population with monetary incomes below the subsistence level has significantly decreased (from 10 to 4%). At the same time, the structure of Moscow's GRP over 50% is formed at the expense of the service sector; • "Infrastructure" block: Moscow is the first region of Russia where the use of mobile high-speed Internet LTE and 4G is widely used among the city's population, which in general contributed to the growth of digitalization of socio-economic processes. In particular, the share of the population using the Internet on a daily basis increased from 60% in 2014 to 82% in 2019; • "Rest and Leisure" block: there are more than 18 thousand sports facilities in Moscow, which is 2 times higher than the level of 2010, and the all-Russian level increased by 11%; • "Social Support and Pensions" block: 8.5% of the population of Russia lives in Moscow, this figure in the period 2010-2019 practically did not change, and the share of execution of the budgets of the Pension Fund and the Social Insurance Fund under the item "expenses" of Moscow in the structure of Russia increased by 1.5 and 4.2%, respectively.
According to the approach used for the convolution of multidimensional categories [35], the value of the aggregate integral indicator of social comfort will be determined by the formula: where j vthese are normalized non-negative weights determined by the fraction of the explained variance of the first principal component of each of the 12 blocks; 5 , j t ya block indicator of social comfort in year t.
The higher the weight of the block indicator, the more influence it has on the composite indicator of social comfort. Based on this method, it is possible to determine the priorities of socio-economic policy in order to improve social comfort.
We will analyze the performance of this method based on comparing the values of the block indicators and their weight in the composite indicator for two regions (the Republic of Buryatia and the Tula region).
Let us consider the growth rates of the values of block indicators of social comfort for 2010-2019 (Fig. 2), as well as the weight of each component in the composite indicator. 6 It can be noted that the priority areas for increasing social comfort are infrastructure 5 According to official statistics, 10 blocks were built due to the lack of information on the blocks "Ethical norms and values", "Political stability", and according to Google Trends -12 blocks. 6 The weight of the block indicators is the average for 2010-2019 and constant for each object of observation.
(weight in the aggregate indicator -22%); health and medical services (21.2%); ecology and the environment (13%); housing and living conditions (12%). At the same time, the aggregate indicator of social comfort in the Tula region increased by 8.4%, and in the Republic of Buryatia -by only 0.5% due to the outstripping growth of block indicators of social comfort in the Tula region, which are a priority (infrastructure, housing). The data obtained can become the basis for monitoring social comfort and subsequent adjustments to the ongoing socio-economic policy of the region within the framework of priority factors.

Comparison of simulation results based on GT data and official statistics
According to the results of modeling based on Google Trends data, the composite integral indicator for Russia from 2010 to 2019 increased by 54% and amounted to 4.631 points. The subjective estimate of the population, aggregated based on search queries, is lower than the estimate based on Rosstat data (5.371).
Using the developed methodology for processing Google Trends (1)- (5), the values of block indicators of social comfort were calculated and a comparative analysis was carried out with similar indicators according to official statistics. Table 3 shows correlations for block indicators of social comfort.
Ac c o r d i n g t o t h e a n a l y z e d t a b l e , a significant correlation of all block indicators of social comfort is obvious. At the same time, a stable positive linear relationship is observed for seven out of ten compared blocks of social comfort. There was a strong positive correlation of indicators of the block "Social support and pensions", "Ecology and environment"; moderate positive correlation -"Health and medical services", "Employment and working conditions",

Fig. 2. Сomposite indicator of social comfort in regions
Source: compiled by the authors.

Republic of Buryatia
Tula region "Infrastructure", "Housing and living conditions", "Financial situation". " F i n a n c i a l s i t u a t i o n ", a cco r d i n g t o calculations, is formed mainly by such words as "credit", "deposit", "profit", "inflation", "accumulation", "cash", etc. Thus, the block "Financial situation" is determined mainly by words characterizing the processes of receipt of funds by the population.
At the same time, a similar component of social comfort in official statistics is assessed by the following indicators: "consumer spending", "fund ratio", CPI, the share of expenditures on food, etc. In general, official statistics primarily considers the population from the standpoint of forming the expenditures of the range of goods and services.
It is possible to increase the correlation by supplementing the list of indicators of official statistics that characterize the financial situation of the population in terms of the formation of its income, for example, the average profitability on bank deposits, income received from transactions in financial markets, the structure of the formation of disposable money income of the population, etc.
The debatable issue is the strong negative correlation of the blocks "Education and Learning", "Safety", "Rest and Leisure". Further, we will discuss the possible causes of existing dependencies.
The indicators of the "Education" block, according to official statistics, are formed by quantitative indicators of the coverage of secondary, secondary professional, and higher education. Most of the indicators of this block (for example, "The share of university students in the working-age population", "The number of graduate students", etc.) demonstrate negative dynamics. A similar trend, which began in 2010, is typical for most regions of Russia [36] and is associated with demographic problems in the country. In this regard, there is a negative dynamic of the block indicator "Education", which contradicts the dynamics of the indicator "Education" according to GT data. It should be noted that a significant drawback of the data on this block published by Rosstat is that they do not fully reflect the level of the intellectual development of the region/ country, which ensures competitiveness, changes the standard of living and the level of social comfort. In this regard, the indicators of enrollment in education should be supplemented with indicators c h a r a c t e r i z i n g t h e p e r f o r m a n c e o f schoolchildren, students, the quality of final/entrance exams, international tests (GMAT, IELTS, etc.); the number of academic competition winners; indicators of the attractiveness/openness/prestige of the university; availability of education (preschool, secondary, higher).
The work [37] details the advantages and objectivity of using a new approach to assessing the level of the country's intellectual capital based on the use of new indicators for assessing the level of education in the country. Using other sources of information: polls, GT data allow expanding the indicators of enrollment in education in Russia by qualitative characteristics.
An example of a strong inverse correlation indicates the need to expand education statistics with qualitative indicators, in particular, to consider the possibilities of using alternative sources -GT data, which will increase the objectivity and quality of the information provided. Thus, the analysis showed that the most popular and significant search queries in the "education" block are the words "remote learning", "English", "mathematics", "academic performance". The growing interest of the population on these topics indicates the growth of the intellectual capital of the population.   In [38], it is noted that the self-awareness of the safety of citizens is influenced by the personal attitudes of citizens and everyday practices (whether they have to carry weapons, gas canisters, etc.), and not the number and disclosure of crimes. In addition, in [39] it is shown that crimes that fall under the article "Murders" have a latency coefficient in Russia of 2.3. This means that the real number of crimes is 2.3 times higher than the indicators of official statistics. Данный факт подтверждается результатами проведенного исследования. This fact is confirmed by the results of the study. The numerical safety score according to GT data is much lower than the estimate according to official statistics: 6 points against 8 points (Fig. 3).
If we analyze the dynamics of the two indicators of the block in Fig. 3, it can be noted that during the unfavorable economic situation in 2014-2016, caused by the imposition of sanctions, the escalation of the geopolitical conflict, and, as a consequence, the depreciation of the national currency, there is a deviation of the indicator of the "Safety" block according to GT data. This is due to the concerns of citizens, a decrease in the sense of security and comfort. But the graph, built according to official statistics, in the period 2014-2016 demonstrates strong growth, which is contradictory. There is reason to believe that for an adequate assessment of the level of safety of the population, it is also advisable to integrate safety indicators based on GT data into crime statistics.
Block indicators "Rest and Leisure" also show an inverse linear relationship. The statistics of this block are represented by only three indicators: the number of sports institutions; the number of vouchers sold through travel agencies, and the number of Russian tourists served by travel agencies. At the same time, according to GT data, the analyzed block included about 54 indicators reflecting various aspects of an individual's rest and leisure (Fig. 4).
The dynamics of the analyzed component of GT are quite adequate to the realities of economic life: the consequences of the COVID-19 pandemic negatively affected the rest of the Russians, since they had to change their travel preferences. The consequence of this is a decrease in the values of this component by 31% within one year. Since the composite indicator of social comfort "rest" has a significant contribution -10%, more attention should be paid to the development of new tourist destinations, active recreation of the population. The significant role of the indicator of the "Rest and Leisure" block in the formation of social comfort, as well as its weak representation in official statistics, justifies the need to use Google Trends in the development of a methodology for accounting for tourism and recreation statistics.
Thus, the components of social comfort, built on the basis of GT data and having a significant positive correlation with the components of official statistics, can be used to conduct operational monitoring of the living conditions of the population. F o r t h e co m p o n e n t s " E d u c a t i o n a n d Learning", "Safety", "Rest and Leisure", a serious methodological study of options for integrating alternative sources of information into official statistics is required due to inconsistency of results and a poor reflection of the level of social comfort of the population.

PROSPECTS FOR USING BIG DATA TO FORECAST FINANCIAL PERFORMANCE
T h e c a l c u l a t i o n s p r e s e n t e d e a r l i e r demonstrate the good potential of using big data to predict 7 out of 10 components of social comfort, including the component "Financial situation". In connection with the deterioration of the financial situation of the population in Russia, the prospects for using search queries reflecting the financial mood of economic agents for forecasting economic indicators remain extremely relevant.
The financial sentiments of economic agents, aggregated in one indicator using Google Trends search queries, are becoming an important source of information about their preferences and behavior. To determine the prospects for using the calculated component as a proxy for indicators of financial condition, let us calculate the correlation with the indicator "Market capitalization of national companies whose shares are traded on the stock exchange". According to the published information of the World Bank, the banking system and the stock market are directly related to economic growth, which is the main factor affecting poverty reduction, which, in particular, is considered in the financial component of social comfort.
The high correlation with market cap indicates that search queries can be used to extrapolate financial health indices without using financial statistics resources, making the calculation easier and faster.
The reaction of Internet users' interest in relation to financial market indicators in response to changes in the economic indicator (GDP, MICEX capitalization index, inflation, deposit rates, etc.) social comfort depending on changes in the indicators of the financial situation of the population and the expected trends in economic growth.

CONCLUSIONS
Big data has more detailed statistical assessments of various phenomena and processes in society, which is a necessary argument in developing the provisions of the concept of the quality of life of the population as one of the most important categories of social and economic science. The introduction of the latent category of "social comfort" into  the scientific use deepens the theory of the quality of life of the population in terms of studying a person from the point of view of his inclusion in society, expanding the subjective aspect of measurement, which is explained by the need to use Google Trends. In the proposed study, an integral assessment of social comfort is carried out using two sources of information: official statistics and Google Trends. The integral assessment of social comfort allows, in turn, to see a bigger picture of the development of the phenomenon in time, as well as to assess the ongoing socio-economic policy.
To minimize the author's subjectivity in assessing social comfort, according to Google Trends, a new approach to semantic search for information about the components of social comfort is used, based on the use of a specialized dictionary, which contains classifications of various processes and phenomena, as well as an analysis of the validity of each search query in terms of disclosure of social comfort and correlation with the realities of Russia.
In the process of modeling, the problem of harmonization and interconnection of different types of resources is solved: a set of econometric methods is used to diagnose data for the presence of a time trend, sharp jumps in the popularity of a query, the presence of a zero-search volume, extreme s e a s o n a l i t y a n d b r i n g i n g t i m e s e r i e s to a comparable form not only among themselves within the same information resource. The method used to standardize search queries allows for reliable estimates and modeling of composite categories based on different types of data.
On the basis of the applied methodology, in the analysis of social comfort, 475 Google Trends search queries and 100 indicators taken from official statistics were used, which were aggregated into block indicators of social comfort. Correlation analysis of block indicators showed a stable positive correlation between the components of social comfort built on the basis of GT and official statistics (7 out of 10 components), which indicates good prospects for using alternative sources of information (for example, GT) to assess social comfort for real-time monitoring without resorting to official statistics. There is a strong negative linear relationship for the three components "Education and Learning", "Safety", "Rest and Leisure", which is mainly explained by the weak reliability of statistical indicators for assessing social comfort and determines the primary need to integrate big data in these areas for sharing various sources of information in order to obtain more reliable estimates.
Thus, we can conclude that the use of big data in assessing latent categories gives good results, comparable to the data of official statistics, which opens up opportunities for their use in monitoring and forecasting the financial situation. However, the integration of the two data sources should be carried out sequentially when conducting possible verification with other sources, for example, with the data of opinion polls.