Creating supreme techniques to foretell real estate costs
Property troubles research with real-world information
Introduction
The goal of our research is to find the most effective way to prognosis real estate costs based on the actual information about the real estate prices. The dataset comprises descriptions of house sales in a small American town for four years. It includes almost 3 thousand notes. This information allows us to find what really impacts the property price.
It will consist of 3 parts. The first part is about the evaluation of researching information. The second part is data preparation. The third part is about modeling. So, below you’ll find a step-by-step manual to finish the real data AI project and choose the most effective model.
Data Research
Exploratory Data Research is one of the most important phases of any ML project. During this phase, scientist investigates the dataset to find the templates, outliers, and recognize relations among the variable, and possible correlations. This process allows us to get information about what really impacts real estate costs and quite often help to decrease the number of analyzed fields by dismissing the correlated variables.
The main goal of Exploratory Data Research is to investigate the structure of the raw data to understand information better and detect hints about data trends, and anomalies, and to represent suppositions and ideas of our research.
What you have to do:
Bring in libraries and download the dataset
- Here are some recommendations about the open-source datasets available:
https://www.kaggle.com/code/skirmer/fun-with-real-estate-data/report
https://serokell.io/blog/best-machine-learning-datasets
- Control a synopsis of training and testing information
- Compare a synopsis between train and test information
Allocation
- Research of result trait
- Control the allocation of all quantitative independent traits
Relations
- Control the interrelation between the quantitative traits
- Control heatmap of quantitative traits
- Control ANOVA for all definite traits
- Analyze the boxplot for all definite traits
- Analyze pair plots between dependent and independent traits
- Analyze dispersion plots
Control differences in missing rates in the information.
If you recognize the differences, you need to move to the data cleaning stage, read the guideline about data clearance here.
ToDo:
When you have finished the data preparation, the next stage is to train your model with different approaches and to compare the results. The winner, with the best accuracy, will be the solution for our task and can be used in future real estate price forecasting.