The data was assessed from Inside Airbnb on 3rd November, 2018.
According to Inside Airbnb, they are “an independent, non-commercial set of tools and data that allows you to explore how Airbnb is really being used in cities around the world”. You can learn more about Inside AirBnb here
The code below was used to create the final dataset for analysis
library(tidyverse)
# First, uncompress and load the data. Then select id and review_scores_location variables
uncompress = gzfile(description = "data/listings.csv.gz", open = "rt")
df1 = read.csv(uncompress, header = T) %>%
select(id, review_scores_location)
# Read the summary for NYC
df2 <- read.csv("data/listings.csv", stringsAsFactors = FALSE) %>%
mutate(last_review = as.Date(last_review, format = "%Y-%m-%d"))
# Combine the two datasets
df <- inner_join(df1, df2, by = "id")
# Save the data
write.csv(df, file = "data/nyc_airbnb.csv")
The final dataset has 50,041 observations and 17 variables:
id
: listing idreview_scores_location
: 0-5 stars converted into a 0-10 scalename
: listing namehost_id
: host idhost_name
: host nameneighbourhood_group
: NYC boroughneighbourhood
: NYC neighborhoodlatitude
: listing latitudelongitude
: listing longituderoom_type
: type of listing (Entire home/apt, Private room, Shared room)price
: listing priceminimum_nights
: required minimum nights staynumber_of_reviews
: total number of reviewslast_review
: date of last reviewreviews per month
: average number of reviews per monthcalculated_host_listings_count
: total number of listings for this hostavailability_365
: number of days listing is available out of 365