This site hosts the final project for CS109, Fall 2014.
Contributors To This Website:
Airbnb is a web-based marketplace for people to list, discover, and book unique accommodations around the world. It has over 800,000 listings in more than 34,000 cities and 190 countries. Every property listed is associated with an online profile including information about the property such as amenities, space, reviews by previous guests, as well as information about the host. Airbnb provides a medium for hosts to monetize their extra space and provides travelers with an alternate means of lodging to hotels.
As a host of Airbnb, we wanted to optimize our listing, by investigating the following:
- How other people priced around me, relative to dimensions such as amenities, reviews, and location?
- Can I learn something by looking at other properties who are "successfull" on Airbnb - with success being defined as acquiring many customers (potentially measured by number of reviews) and ability to price above market?
- Optimize the price for our listings by studying the data of similar properties.
- What is the distribution of price in a given area?
- Are there certain neighborhoods that are substantially more expensive than others?
- What is the relationship between price and the frequency at which a listing is renting?
We wanted to be able to study this data, visualize it and see if we could glean additional insights than what is available on Airbnb.
Methodology / Table Of Contents
1) Scraping / Data Collection: visit the Github repository for the code used to scrape Airbnb.
Description: The code employed for scraping (ScrapeAirbnb.py) as well as the instructions on how to run this code (readme file) is located in the associated Github repository of this project. We built a scraper to get data for over 2000 listings in the Boston Metro area. Data scraped includes information on listings such as space (property type, number of bedrooms, number of bathrooms, etc.), amenities (kitchen, TV, internet, etc.), prices (cleaning fee, etc.), reviews, location (longitude, latitude, and location review), host information, and description. This code is generalized and can be used to scrape listings for any location.
2) Data Cleaning: the Github repository also contains functions that we used to "clean" the data after scraping. The name of this file is DataCleanAirbnb.py
3) Data Analysis: We first explored the data with plots in both matplotlib and Tableau. Then we attempted to cluster properties using PCA and K-means clustering, and attempted to better understand the most important variables that are related to price by using the variable importance feature of Random Forests. You can view the file here, and it also available in the Github repository and is called AirbnbWrapUp.ipynb
4) Visualization: We were able to create visualizations in Tableau to help us further explore the data and glean interesting insights from the data. One of the dashboards are dedicated to helping one of our team members, Hamel Husain decide how to best price his listing. The dashboard is embedded at the top of this page, but might be better viewed here. Please note that the dashboard contains four tabs at the top which display distinct sets of information.
5) Video: We made a short video describing our motivation and problem statement for this project. That can be viewed here.
- One of the goals of this project was to find the best price for Hamel Husain’s (a member of the team) Airbnb listing. The visualizations we produced allowed us to see subtleties that are not easily seen in the data or analyzed by machine learning techniques. We made a visualization specific to Hamel's neighborhood and amenities, which is located on the second tab of the above visualization. It was extremely useful to look at listings in Hamel’s neighborhood that were priced well above average while also collecting lots of reviews and visit the pages for those specific listings. We discovered that these listings had very high-quality, professional photos and were decorated in interesting and unique ways. It was surprising to us that these listings performed so well by simply employing superior marketing. While these attributes are not represented in the data directly, we were able to find them through exploring interactive visualizations. Hamel feels confident in pricing his Cambridge apartment at $140 per night – which is $50 above the median price of $90, as long as he decorates and markets his unit in a similar way to the outliers we observed. We highly recommend that you also explore all four tabs of the Tableau dashboard, as it is very interesting and fun to view this data!
- Other features that are associated with the price of listings are: space, location, and luxury amenities. As for the space, the type of the property (entire home/apartment, private room, shared room) as well as the number of guests the property could accommodate and the number of beds/bedrooms available are the most important features which relate to price. After normalizing price by space by dividing price by number of bedrooms, the importance of location is evident. Locations such as MIT Kendall Square and Back-Bay are the more expensive areas in the Boston Metro area. Additionally, we could see that properties with luxury amenities such as a gym and elevators are more expensive on average compared to others.
- Price does not seem to be very related to number of reviews. However, price does seem to be related to the score of reviews – meaning that positive reviews are associated with higher prices.