Ammar Sidhu
This study investigated spatial and non-spatial regression techniques for modeling house price predictions in Toronto, Ontario using detached, semi-detached, and townhouse listings in 2024. A collection and aggregation of geographic features, and 2021 census tract demographic features in addition to the housing characteristics provided by the scraped listings from Zillow.com was done to serve as predictors of house prices for the listings based on the literature. The geographic features were secondary data collected from the City of Toronto’s open data portal, and Toronto Police Services. Census tract level demographics data was collected from the CHASS data center hosted by the University of Toronto. The listings were scraped from February to July 31st, 2024, which resulted in 5533 listings during this time. Results from the spatial and non-spatial regression modeling showed the existence of spatial autocorrelation in house prices for Toronto. Suggesting that high house prices cluster together and low house prices cluster. This resulted in lower RMSE values and higher R2 values for the spatial models. Since the relationship between the housing features, and the house prices were non-linearly related, the non-linear spatial model, spatial random forest, provided the highest accuracy. This suggests that for cities like Toronto, where house prices are spatially autocorrelated, non-linear spatial algorithms are best for house price predictions. A lack of accountability for temporal changes in house prices limits how concrete these findings can be.
Keywords: Property Listings, Listing Features, House Prices, Correlation Analysis, Linear Regression, Random Forest, Spatial Autocorrelation, Spatial Regression, Spatial Dependence, Spatial Random Forest