The data was assessed from Inside Airbnb on 3rd November, 2018.

According to Inside Airbnb, they are “an independent, non-commercial set of tools and data that allows you to explore how Airbnb is really being used in cities around the world”. You can learn more about Inside AirBnb here

The code below was used to create the final dataset for analysis

library(tidyverse)

# First, uncompress and load the data. Then select id and review_scores_location variables

uncompress = gzfile(description = "data/listings.csv.gz", open = "rt")
df1 = read.csv(uncompress, header = T) %>% 
    select(id, review_scores_location)

# Read the summary for NYC
df2 <- read.csv("data/listings.csv", stringsAsFactors = FALSE) %>% 
  mutate(last_review = as.Date(last_review, format = "%Y-%m-%d"))

# Combine the two datasets
df <- inner_join(df1, df2, by = "id")

# Save the data
write.csv(df, file = "data/nyc_airbnb.csv")

The final dataset has 50,041 observations and 17 variables: