Residential Property Price Index in Nigeria: A Data Mining Approach

: We employ the web-scraping technique and IMF residential property prices index methodology outlined in the latest RPPI practical compilation guide to compute the Nigeria’s Real Estate property Price Index (RPPI). The data was scraped from one of the largest real estate website in Nigeria hosting the largest real estate ads online. A total of 35,957 residential property sales ads comprising of 30,693 house and 5,264 flat/apartment listing from October 2021 to October 2022 was used for the study. A web scraping code was implemented in R-statistics to get the data. The asking price and other related information gotten from the website was used to compute the overall RPPI and its sub indices (for house and flats/apartments). The findings present the RPP national (total) index and sub-indices for the residential building (house) and residential flat/apartment. While the various data sources used in generating data for the RPPI computation have their advantages and disadvantages, the web scraping method provides a very timely approach, as data can be scraped almost immediately. This ensures timely policy decisions and implementation and also reduce the cost of survey tremendously if not totally. The study recommends the use of the web scraping technique in the generation of RPPI data to ensure timely policy decisions and internationally acceptable standard of RPPI compilation. With the web scraping approach to data collection, high frequency RPPI like monthly or weekly may be computed for the country.


Introduction
Property prices are essential indicators of the economy.It has co-movement with consumption, the Gross Domestic Product (GDP), inflation, current account balance, investment, and the output gap.Residential property is the largest single asset for most households around the world.The variations in residential property prices affect households' long-term investment strategy and influence their spending and borrowing patterns.Changes in the property prices influence the banking and financial sectors of the economy through bank lending and mortgage channel [1].The residential property price (RPP) indices are used by both monetary and fiscal authorities: as an indicator for macroeconomic, monetary policy and inflation targeting; in estimating the value of housing as a component of wealth; as a financial stability or soundness indicator to measure risk exposure; as a deflator in the national accounts; as an individual citizen's decision making on whether to buy (or sell) a residential property; as consumer price indicator; among others.Also the G-20 Data Gaps Initiative and guidance on Financial Soundness Indicators identified real estate statistics, particularly, the aspect of price changes on residential property, as a major input into financial stability policy analysis and macro-prudential measures [2,3].
Unlike the traditional method of using surveys to compile the residential properties price index, it can be compiled almost immediately at the end of each period (monthly, quarterly or annually) using web scraping techniques.The method of compiling RPPI using web scraping, has been found to have statistically significant correlations between residential house prices based on survey methods.Though there were some observed divergences in terms of price levels, there is however a bidirectional relationship between the indices based on registration data and web data [1].
Web-scraping is the process of collecting large amounts of data from the web.It offers the potential to improve greatly the quality and efficiency of price indices.The use of webscraped data for compiling high-frequency price indexes has been explored in many studies [6][7][8].Data collected through digital means, especially with web-scraping is often more comprehensive, has increased sample sizes and reduced response burden, hence the use of web scraping techniques as tools to capture large amounts of price data has proven to be useful official price statistics technique [6].The web-scraped data source is gaining increasing importance in official price statistics and is common for individual products [2].
The surge in e-commerce activities has provided customers with access to a significant variety of products from the convenience and safety of their homes [10].A survey showed that following the pandemic, more than 50% of the respondents shop online more frequently than before [9].The COVID-19 pandemic highlighted the need for indicators that allow the economic situation to be monitored with a much higher frequency than traditional monthly or quarterly indicators [10].It is therefore necessary that price statistics cover the online purchases and their price movements.Studies on prices therefore need to accommodate the growing share of e-commerce in the overall household consumption budget.Since the movement of online prices are also representative of offline retail prices dynamics, acknowledging the use of online prices for constructing official consumers price indices (CPIs) may be of utmost benefit as it not only gives timely and detailed information on prices, but also present a cheap or almost zero-cost of obtaining data for price statistics [12,13].
This study therefore employs the web scraping approach to sourcing of the residential real estate price data and computes the RPPI using the IMF methodology [5].Since this method has been found to give highly correlated result with the traditional survey, inculcating it as part of the RPPI process will help reduce survey cost as well as burden on respondents.With the web scraping approach to data collection, high frequency RPPI like monthly or weekly may be computed for the country.

Property Prices Characteristics
Most commonly available characteristics that may influence the prices of property are the number of rooms, dwelling type, district, region, property type, floor area, age, vintage (new or existing), completion state of dwelling, existence of an elevator, etc. Location such as province (state), city and sub-city areas is seen to be the most important characteristic of property [5].
A property can have many different prices at different points in transaction.There is an asking price, a transaction price, declared price and other prices.While transaction price is the aim of RPPI, other prices may still be used, as transaction price can be difficult to obtain since there is no register in a formal or institutional database.For instance, the asking price can be higher than the target price, however, the purpose of the RPPI is to capture price changes, rather than levels, even in these circumstances, asking price may still be a suitable proxy [11].

RPPI Data Sources and Pricing
Prices of properties used for RPPI computation can come from various sources depending on the type of pricing used: Asking price can be easily sourced from websites using web scraping or provided by the site owner.Data can also be retrieved from websites using application programming interfaces (APIs).Generally, websites detailed description and wide coverage of the property of properties and their asking prices are always available by web sources.Properties sold through agents, developers, or owners, are also included in web sources.One major advantage is timeliness since the price is obtained as soon as the property enters the market.
Agents and developers of properties may also be surveyed to obtain residential real estate prices and related information.The advantage of a survey is the ability to obtain the precise information they require according to the concept they are measuring.While a survey provides the greatest degree of control over the information used to construct the index, it has several drawbacks including high costs and response burden.In addition, it is very difficult to obtain representative samples for real estate properties in the market.When sampling respondents, adequate coverage of the dwelling type, turnover and geography is required.The sample should be stratified with the larger developers and agents included in the sample on a regular basis and the smaller agents and developers rotating into the sample every year [2].
Another key data sources for residential real estate statistics are national land registries or property assessment files maintained by government or tax authorities.These data may cover all transactions in all geographic regions.While land registries and property assessment files are often an excellent source of information, transactions may not be recorded in the proper period.In certain countries it is possible that the registration of the sale may take place three months or more after the transaction had taken place.In this case the price would be recorded in the wrong period and the compiler may introduce a time bias in the calculation of the index.

Summary Statistics
Economic reasonability of the RPP data is assessed by creating summary statistics for all variables in the data set.The summary statistics may include mean, median, minimum and the maximum price in the given period.The summary statistics will be good in tracking outliers as well as checks for missing prices.The summary statistics is computed for each of the variables and categories included in the RPP computation.Frequency count for each variable will also be of interest.When scanty observations occur for some variables, the variable may not be used when constructing the pricing model since it is not statistically representative.Frequency counts can also be useful when grouping or stratifying the data.When there is small number of observations, there may be need for aggregation of variables into higher level or category.The method of compiling RPPI using web scraping, has been found to have statistically significant correlations between residential house prices based on survey methods.Though there were some observed divergences in terms of price levels, there is however a bidirectional relationship between the indices based on registration data and web data [1].

RPPI Weightings
In Residential property statistics, weights are used to aggregate prices from the primary level into more meaningful aggregates and groups.The primary levels are usually equated to a stratum.Since there are items in the basket that are of higher importance than other items, weights should be differentiated and correspond to the importance of the item in the basket.There are two types of weights that an RPP statistics: flow and stock weights.Flows are changes from one period to the next while Stocks are positions in time.A stock weight could be the number of properties in a geographic region on a specific date.A flow weight could be the number of properties that were sold during a certain period.Flow weights are therefore associated with the transactions that took place in a certain period while stock weights are associated with the accumulated flow of transactions that have taken place from a point in time.The flow weights are suited for monitoring financial stability and the stock weights are more appropriate if the index is being used to track the price change in the stock of housing.

Computing RPP Index and Stratification
The median price is likely the best estimate of the price of properties, as house price distribution is often positively skewed.The use of the median prices assumes that prices for all dwellings will follow the price trend of the typical dwelling.The median is widely used in property prices statistics than the mean as it is less subject to price fluctuations that can be caused by a small number of highpriced properties.Index can be stratified by property locations(urban or rural), by the vintage of the dwelling (new or existing), the size, etc [4,5].

Data and Method
In this study, we employ the IMF residential property prices index methodology outlined in the latest RPPI practical compilation guide [5].The methodology targets a national monthly or quarterly index covering monetary transactions (purchases and mortgage) of all types of residential properties.The data needed for a high quality residential property price index heavily rely on detailed information of each property given that there are a wide range of characteristics that can influence the price of a dwelling [5].The data was scraped from one of the real estate website in Nigeria hosting one of the largest real estate ads online-Nigeria Property Center available at https://nigeriapropertycentre.com/ has over 96,000 property ads listing from over 18,000 agents and developers across the 36 states and the federal capital Territory as at the time of this study.On the whole, a total of 35,957 residential property sales ads comprising of 30,693 house sales listing and 5,264 flat/apartment sales ads listing from October 2021 to October 2022.A web scraping code was implemented in R-statistics to get the data.The asking price was gotten from the website and used to compute the overall RPPI and its sub indices (for house and flats/apartments).

Calculation of a Quarterly RPPI Using a Median Price with Stratification
The index compilation for the first year prior to publication is outline below.We define the strata for this project as: property type (Flat/Apartment and House).The median price within each stratum is computed for the two categories and for each period.Firstly, the first quarter of 2021 is the reference period; this will be the base price for the year.The sub-indices for the categories are compiled by dividing the median price in the current period with the median price in the reference period for each.The subindices are then aggregated by a weighted average using the base year weights.According to the RPPI guide, the base year weights may be derived as the total value of the properties classified in the stratum divided by the total value of all properties in all strata.The reference period of the sub-indices corresponds to the first period (2021Q1).Since the reference period should be a full year and not a quarter.In order to re-reference the sub-indices to the full year (2021=100) the index value for each quarter is divided by the annual average index value (calculated as the mean of the quarterly indices for all categories and total respectively.

Result and Discussion of Findings
The use of alternative means of data collection, other than the traditional means of survey is receiving some attention recently both for individual researchers and government statistics.For instance, The US Bureau of Labor Statistics has done several web scraping projects to supplement its traditional field survey of price data [14].The USA Census Bureau is programming an automatic scraper for tax revenue collections from websites of sub-national governments as opposed to the traditional means of field survey [15].Similarly, Statistics Canada looked into ways of incorporating web scraping to reduce survey burden on respondents [18].
Bricongne [16] scraped daily data from the UK housing market to building timelier and granular and higher frequency indicators from the sellers' perspective [19].Souza [17] verified the spatial autocorrelation between the mean prices of the housing obtained from web scraping technique in online platforms.
This study employs the web scraping approach to sourcing of the residential real estate price data and computes the quarterly RPPI for Nigeria using the IMF methodology [5].A national index for the total property prices and sub-indices for the two classifications of flat/apartment and house were computed.Figure 1 shows that the number of properties listed for sale grew from below 2000 in 2021Q1 to above 14500 in 2022Q3 (table 3 of appendix).The estimated value of property listed, though fluctuated but grew from below 1 trillion naira in 2021Q1 to above 9 trillion naira in 2022Q3.The 1 st quartile statistics shows that 25% of the residential properties listed ranged below 49 million naira (see table 4 of appendix).The ratio of room to toilet was stable around 0.8 bathrooms to a room.Also the room to bathroom ratio was stable around a room to one bathroom ratio (table 2 of appendix).The residential property price was computed on quarterly basis from 2021Q1 to 2022Q3.The year 2021 was used as the base year for this study to compute chained index from 2022Q1 to 2022Q3.While the overall median price hovers around 150million to 200 million naira, the median for house prices was between 90 million to 120 million naira (figure 4).The RPPI for Q3 2022 shows that property price have depreciated by 5.56% since 2021.The price for apartment was stable around 65 million naira from Q12021 to Q3 2022 (table 3 of appendix).

Conclusion
This study constructed the residential real estate property prices index (RPPI) using the most recent IMF methodology [5].Though similar studies have been conducted using web scraping to generate the RPPI data (see [6] & [8], to the best of our knowledge, this is the first attempt to replicate such studies in Nigeria.This present study provides summary statistics that may be relevant to residential real estate property market in Nigeria.The findings present the RPP national (total) index and sub-indices for the residential building (house) and residential flat/apartment.The median was used in computing the RPP indices as it is a more stable descriptive statistics that the mean and it is not usually affected by extreme values.
While the various data sources used in generating data for the RPPI computation have their advantages and disadvantages (see table 1), the web scraping method provides a very timely approach, as data can be scraped almost immediately.Unlike the survey that may take weeks or even months to complete, the web scraping approach can be done almost immediately, even a day after the end of the period.This ensures timely policy decisions and implementation.It will also reduce the cost of survey tremendously if not totally.The study recommends the use of the web scraping technique in the generation of RPPI data as well as the recent IMF methodology to ensure timely policy decisions and internationally acceptable standard of RPPI compilation.The unchained index is the year-on-year changes, while the chained index is the year on year change corrected for the base year period.The RPPI is computed at 2021 base year.The median price was used in all computations

Figure 1 .
Figure 1.No of Property listed per quarter.

Table 1 .
Comparison of the Features of the Different Data Sources.

Table 2 .
Characteristics of Property.

Table 3 .
Summary Statistics of Property Prices in Nigeria.

Table 4 .
Summary Statistics of Property Prices in Nigeria(cont').

Table 5 .
Real Estate Price Indices.