<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:media="http://search.yahoo.com/mrss/"
	
	>

<channel>
	<title>Yinuo Yin</title>
	<link>https://yinuoyin.com</link>
	<description>Yinuo Yin</description>
	<pubDate>Wed, 12 Sep 2018 20:44:01 +0000</pubDate>
	<generator>https://yinuoyin.com</generator>
	<language>en</language>
	
		
	<item>
		<title>Rebalancing the Bikes</title>
				
		<link>https://yinuoyin.com/Rebalancing-the-Bikes</link>

		<pubDate>Fri, 16 Feb 2018 03:58:15 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/Rebalancing-the-Bikes</guid>

		<description>


Predictive Modeling
Rebalancing the Bikes

The BIG problem: For all bike share system, while some stations do not have enough bikes for riders to check out, other stations are lack of empty docks for riders to park their bikes.




&#60;img width="2466" height="1402" width_o="2466" height_o="1402" data-src="https://freight.cargo.site/t/original/i/1d754cb076ccc111503403beb6e0af6cd48528e6ad8630cc8bb1769b97ced6e7/Top-Image.png" data-mid="12066586" border="0"  src="https://freight.cargo.site/w/1000/i/1d754cb076ccc111503403beb6e0af6cd48528e6ad8630cc8bb1769b97ced6e7/Top-Image.png" /&#62;


	The DivvyX App
User interface mock up
	The Solution: Predictove Model and APP Design
Chicago Divvy bike share system faces the same problem, and we want to propose a data-driven approach to solve it: 

building a predictive model forecasting future number of bikes at each Divvy station across Chicago.&#38;nbsp;




The model can be applied to our proposed App, Divvy X, which allows bikeshare operators to check how many bikes are likely to be at each Divvy station in the future. 

Our goal is to help the bikeshare rebalancers reallocate bikes more proactively and plan their work everyday more efficiently. For how this App works and how the model supports it, plese watch our YouTube video!

Due to the limitation of data available, in this project, we are only able to predict the hourly bike departures from 150 Divvy bike stations in downtown Chicago. However, the model building process can serve as a starting point for future exploration of this topic. The R markdown document explains in detail how we conduct this project.&#38;nbsp;

The Model: Poisson RegressionFor the project, we initially considered OLS regression model. However, the count of bike trip departures in each station is not normally distributed and 
there are a lot of zero count values, which means that if we have the bike trip departure as the dependent variable in OLS regression, the assumption of normality of residuals will be violated. 
Moreover, the distribution of bike trip departures from stations is subject to a lot of factors such as weather, the day of the week, the time of the day(i.e. there would be more bike trip departures during rush hours). Hence, we consider the bike trip departure may have a Poisson distribution.&#38;nbsp;







	&#60;img width="1344" height="960" width_o="1344" height_o="960" data-src="https://freight.cargo.site/t/original/i/e4041fda5fef5b8b8de8b647ca5f9a09cda6f1292f2e666d0fedd9774cb4c91c/sc1.png" data-mid="12067100" border="0"  src="https://freight.cargo.site/w/1000/i/e4041fda5fef5b8b8de8b647ca5f9a09cda6f1292f2e666d0fedd9774cb4c91c/sc1.png" /&#62;
	&#60;img width="1344" height="960" width_o="1344" height_o="960" data-src="https://freight.cargo.site/t/original/i/8335d26160290766c770e24acabbe98b673b8769494e646058a9b2a3fa258d25/sc2.png" data-mid="12067101" border="0"  src="https://freight.cargo.site/w/1000/i/8335d26160290766c770e24acabbe98b673b8769494e646058a9b2a3fa258d25/sc2.png" /&#62;



	

Impact of Predictors
Standardized Coefficient Graphs






	


	The Data: Spatial, Time, Weather and More






Dependent variable: Bike trip counts per hour for 150 Divvy stations closest to the center of downtown Chicago 


Training Dataset:&#38;nbsp; Hourly bike trip counts from June 11 to June 17, 2017

Test Dataset:&#38;nbsp; Hourly bike departures on June 21 (weekday) and June 24 (weekend day), 2017


Independent Variables:&#38;nbsp;
- Distance to Bus Stop (d_bus_stop)- Distance to Public School (d_school)- Distance to Grocery Store (d_grocery)- Distance to Park (d_park)- Distance to Railway Station (d_rail_station)
Distance to Nearby Bike Station (d_bike_station)- Bike Lane Density (BikeLaneD)- The Day of the Week (Weekday1…Weekday7)
Bike Trips Departures in Last Hour (lag_CNT)- Bike Trips Departures in Last Week (LW_CNT)- Bike Trips Departures in Last 2 Week (L2W_CNT)
Total Population (TOTPOP_CY)- Total Housing Unit (TOTHU_CY)- Employed Population (EMP_CY)- Temperature (temperature)- Precipitation (precipitation)
- Taxi Trips (Taxi_CNT)


Our regression result shows that all of our selected predictors are significant. The standardized coefficient charts we create help visualize the impact of our selected predictors. 






	
&#60;img width="1344" height="960" width_o="1344" height_o="960" data-src="https://freight.cargo.site/t/original/i/103c79a6053ec14c630164061b42115bdc24c97ba3cd64d87a8081012c32e2df/plot.png" data-mid="12069504" border="0"  src="https://freight.cargo.site/w/1000/i/103c79a6053ec14c630164061b42115bdc24c97ba3cd64d87a8081012c32e2df/plot.png" /&#62;
&#60;img width="1344" height="960" width_o="1344" height_o="960" data-src="https://freight.cargo.site/t/original/i/ebda5d66f84046d9c4d215773bfcd9ef43a5404fd1d9b1411095a5f07fb569f3/plotOLS.png" data-mid="12069584" border="0"  src="https://freight.cargo.site/w/1000/i/ebda5d66f84046d9c4d215773bfcd9ef43a5404fd1d9b1411095a5f07fb569f3/plotOLS.png" /&#62;





	

General Prediction Results
Poisson vs. OLS
	Model Comparison: Poisson Regression vs. OLS
Just to comfirm our initial thought that OLS regression is not suitable for this project, we run both Poisson regression and OLS regression on our training dataset and predict the bike trip departures for our test dataset using both models. 
We plot predicted bike trip counts as a function of observed bike trip counts. The plot on the left shows the result of Poisson regression model while the plot on the right shows OLS regression’s prediction result. 
Overall, many of our predicted hourly bike trips based on Poisson model match with their observed values closely, which reaffirms to us that Poisson regression model would perform better overall than OLS regression model.&#38;nbsp;

	&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp;
&#60;img width="630" height="322" width_o="630" height_o="322" data-src="https://freight.cargo.site/t/original/i/759a554239602507c3890b691f9b7a836c9bc94fd99197fbb3e7b6deaf81f20e/Rplot1.png" data-mid="12070189" border="0" data-scale="100" src="https://freight.cargo.site/w/630/i/759a554239602507c3890b691f9b7a836c9bc94fd99197fbb3e7b6deaf81f20e/Rplot1.png" /&#62;
&#60;img width="630" height="322" width_o="630" height_o="322" data-src="https://freight.cargo.site/t/original/i/2f988a9e1da89eba644d8e491a27d515a4523cee31f5e8e170ded203e265b4e5/Rplot2.png" data-mid="12070194" border="0"  src="https://freight.cargo.site/w/630/i/2f988a9e1da89eba644d8e491a27d515a4523cee31f5e8e170ded203e265b4e5/Rplot2.png" /&#62;
&#60;img width="630" height="322" width_o="630" height_o="322" data-src="https://freight.cargo.site/t/original/i/a64d254dea2188e86d9b6bcca71523add9788f95e2dd3d3c9050ea722dd3cf83/Rplot3.png" data-mid="12070197" border="0"  src="https://freight.cargo.site/w/630/i/a64d254dea2188e86d9b6bcca71523add9788f95e2dd3d3c9050ea722dd3cf83/Rplot3.png" /&#62;
&#60;img width="630" height="322" width_o="630" height_o="322" data-src="https://freight.cargo.site/t/original/i/f9458726391f7f44c0c3c6fcf1bf18171eb0afe7a1c640e0d75d66b36427a9f1/Rplot4.png" data-mid="12070200" border="0"  src="https://freight.cargo.site/w/630/i/f9458726391f7f44c0c3c6fcf1bf18171eb0afe7a1c640e0d75d66b36427a9f1/Rplot4.png" /&#62;
&#60;img width="630" height="322" width_o="630" height_o="322" data-src="https://freight.cargo.site/t/original/i/07150736029046d0a504eeb5c8acecd06c630263afb7e8ef80c903c2304a63ec/Rplot5.png" data-mid="12070203" border="0"  src="https://freight.cargo.site/w/630/i/07150736029046d0a504eeb5c8acecd06c630263afb7e8ef80c903c2304a63ec/Rplot5.png" /&#62;
&#60;img width="630" height="322" width_o="630" height_o="322" data-src="https://freight.cargo.site/t/original/i/d534b52f5f131b2263ed34dfcca68e1af27f7a678c913623be2d6289291f1069/Rplot6.png" data-mid="12070207" border="0"  src="https://freight.cargo.site/w/630/i/d534b52f5f131b2263ed34dfcca68e1af27f7a678c913623be2d6289291f1069/Rplot6.png" /&#62;
&#60;img width="630" height="322" width_o="630" height_o="322" data-src="https://freight.cargo.site/t/original/i/c7ddfca8d8db9e99c05fc8eb5571070add6ebb3bf70ed0578d45211d06239b04/Rplot7.png" data-mid="12070210" border="0"  src="https://freight.cargo.site/w/630/i/c7ddfca8d8db9e99c05fc8eb5571070add6ebb3bf70ed0578d45211d06239b04/Rplot7.png" /&#62;


	

Temporal Prediction Results
Monday to Sunday




	The Results: Successfully Captures General Temporal-Spatial Pattern
1. Predicted vs. Actual Bike Trips for All Stations by Hour Over a Week

We further investigate the quality of our Poisson regression model by generating 7 plots comparing predicted vs. actual bike trips for all Stations by hour over the selected week. The 7 graphs show that our model captures the general trend of bike trips change during each day of a week. Especially for weekdays, our predicted results match very closely with the actual bike trip counts.






	
&#60;img width="720" height="467" width_o="720" height_o="467" data-src="https://freight.cargo.site/t/original/i/0e0ac5132d8ac7595cb0de6d92bde8df3e2590778ff059515829d0a1ce0074c6/1map_mon.png" data-mid="12070442" border="0" data-scale="76" src="https://freight.cargo.site/w/720/i/0e0ac5132d8ac7595cb0de6d92bde8df3e2590778ff059515829d0a1ce0074c6/1map_mon.png" /&#62;
&#60;img width="720" height="467" width_o="720" height_o="467" data-src="https://freight.cargo.site/t/original/i/410a3073fae71d5a3e515e06089a460ee2b790ba2ef0e8c869d924fddfaabedd/2map_tues.png" data-mid="12070443" border="0"  src="https://freight.cargo.site/w/720/i/410a3073fae71d5a3e515e06089a460ee2b790ba2ef0e8c869d924fddfaabedd/2map_tues.png" /&#62;
&#60;img width="720" height="467" width_o="720" height_o="467" data-src="https://freight.cargo.site/t/original/i/1beb7507b234bf8e96f7e5a73534f21a52347c327338c5c587e6e9286f7485e7/3map_wed.png" data-mid="12070444" border="0"  src="https://freight.cargo.site/w/720/i/1beb7507b234bf8e96f7e5a73534f21a52347c327338c5c587e6e9286f7485e7/3map_wed.png" /&#62;
&#60;img width="720" height="467" width_o="720" height_o="467" data-src="https://freight.cargo.site/t/original/i/74bac8aa961483465a603b3ac0a8fe9cca29f1cd66489f833dc10accdc053bc3/4map_thurs.png" data-mid="12070445" border="0"  src="https://freight.cargo.site/w/720/i/74bac8aa961483465a603b3ac0a8fe9cca29f1cd66489f833dc10accdc053bc3/4map_thurs.png" /&#62;
&#60;img width="720" height="467" width_o="720" height_o="467" data-src="https://freight.cargo.site/t/original/i/c0b53e3d27f24a49a1bf7d882384ebac35c97670b0611626cf925cc8ef9060b4/5map_Fri.png" data-mid="12070446" border="0"  src="https://freight.cargo.site/w/720/i/c0b53e3d27f24a49a1bf7d882384ebac35c97670b0611626cf925cc8ef9060b4/5map_Fri.png" /&#62;
&#60;img width="720" height="467" width_o="720" height_o="467" data-src="https://freight.cargo.site/t/original/i/4d35ebcb6c58667d062963bfa787a18e04f6d5fc3522033f68f2823436afa1e1/6map_sat.png" data-mid="12070447" border="0"  src="https://freight.cargo.site/w/720/i/4d35ebcb6c58667d062963bfa787a18e04f6d5fc3522033f68f2823436afa1e1/6map_sat.png" /&#62;
&#60;img width="720" height="467" width_o="720" height_o="467" data-src="https://freight.cargo.site/t/original/i/c138d0bf491b13b882850e69e365af9031609ed745327befcd274c7b821f72f7/7map_sun.png" data-mid="12070448" border="0"  src="https://freight.cargo.site/w/720/i/c138d0bf491b13b882850e69e365af9031609ed745327befcd274c7b821f72f7/7map_sun.png" /&#62;


	

Spatial Prediction Results Monday to Sunday&#38;nbsp;



	












2. Percent
Error per Station over a Week during Rush Hours
We create another 7 maps to show the prediction power by station in a week. Since it is a great challenge to rebalance bikes during rush hours, we decide to create the percent error maps to visualize how well our model predicts spatially for rush hours over a week. These maps show that overall, our model predicts better for stations near CBD.&#38;nbsp;

	
&#60;img width="960" height="1329" width_o="960" height_o="1329" data-src="https://freight.cargo.site/t/original/i/ec34e3cf3635f0fe646848fdf905664d98de21393932ff6801556d7762170688/validation.png" data-mid="12070706" border="0"  src="https://freight.cargo.site/w/960/i/ec34e3cf3635f0fe646848fdf905664d98de21393932ff6801556d7762170688/validation.png" /&#62;
&#60;img width="923" height="1334" width_o="923" height_o="1334" data-src="https://freight.cargo.site/t/original/i/09195eeb3556f259bfb442df67ca24324ba558cb4fc90590c73fb52ee8045ed7/validation1.PNG" data-mid="12070707" border="0"  src="https://freight.cargo.site/w/923/i/09195eeb3556f259bfb442df67ca24324ba558cb4fc90590c73fb52ee8045ed7/validation1.PNG" /&#62;


	
The Limitation: Possible Improvements


We conduct cross-validation for our training dataset as another way to investigate the model quality, as it enables us to see how generalizable the goodness of fit of our model is. We plot two histograms showing the cross-validation Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) for our model.



The RMSE and MAE are not ideal.


The model results indicate that there are some possible ways to enhance the model, such as adding more predictive variables. Also, in addition to the hourly departure bike trips, we may use the hourly change of bike trips at each station as our dependent variable. Lastly, we may consider using a non-linear regression model to capture distinct trends of each individual station.

These potential improvements would better support the function of our App and benefit Divvy bike share system.&#38;nbsp;


</description>
		
	</item>
		
		
	<item>
		<title>Hedonic Home Price Prediction</title>
				
		<link>https://yinuoyin.com/Hedonic-Home-Price-Prediction</link>

		<pubDate>Fri, 16 Feb 2018 03:58:16 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/Hedonic-Home-Price-Prediction</guid>

		<description>




Predictive Modeling


Hedonic Home Price Prediction



	
		
		
	
	
		
			
				
					



In Boston, the house market is thriving. 

Housing price prediction is important to prospective homeowners, developers, investors and other real estate market participants. 


				
			
		
	




	&#60;img width="1536" height="960" width_o="1536" height_o="960" data-src="https://freight.cargo.site/t/original/i/c3e7fe29d5b9fdc69af74134970136699b19ef0f674db92de4e6faf4373b4d87/distAssault.png" data-mid="12100779" border="0" data-scale="100" src="https://freight.cargo.site/w/1000/i/c3e7fe29d5b9fdc69af74134970136699b19ef0f674db92de4e6faf4373b4d87/distAssault.png" /&#62;
	&#60;img width="1536" height="960" width_o="1536" height_o="960" data-src="https://freight.cargo.site/t/original/i/685965ee3fe8db64daf742545a96d4dd8b5ad321e0073c0861447f00dfd3e65c/DistTransit.png" data-mid="12100780" border="0" data-scale="100" src="https://freight.cargo.site/w/1000/i/685965ee3fe8db64daf742545a96d4dd8b5ad321e0073c0861447f00dfd3e65c/DistTransit.png" /&#62;
&#60;img width="1536" height="960" width_o="1536" height_o="960" data-src="https://freight.cargo.site/t/original/i/e6e867fbf798e8cb7694601ba8792ad28897fd9572f2852db92350f8f60f2082/laggedprice.png" data-mid="12100781" border="0" data-scale="71" src="https://freight.cargo.site/w/1000/i/e6e867fbf798e8cb7694601ba8792ad28897fd9572f2852db92350f8f60f2082/laggedprice.png" /&#62;



	

3 most Interesting Predictor Variables


- Distance to aggravated assaults 
- Distance to transit stops 
- Sptially lagged home prices


	 

Zillow Group, an online real estate database company, has realized that its housing market predictions are not as accurate as they could be due to insufficient local intelligence, so they have required us to build a better OLS regression model to predict home sale prices for Boston.


In our model, the dependent variable is home sale prices, and we consider three main categories of decision factor: internal characteristics, amenities and public services, and the underlying spatial structure of prices. See below for all selected predictors.&#38;nbsp;

 Note: in all the regressions we conduct, we actually use the logarithmic transformation of the home sale price instead of the raw values of the sale prices, since the dependent variable home sale prices are not normally distributed.&#38;nbsp;




	List of Predictors

&#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “SalePrice”: the home sale price point in Boston of our dataset. (Note: in the OLS regression model we conduct for our analysis, we take a logarithmic transformation of the values of the home sale price, since the raw values of the home sale price are not normally distributed)&#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “d_transit”: the average distance between each home sale price point of our dataset and its 5 nearest MBTA transit stops.&#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “d_crime”: the average distance between each home sale price point of our dataset and its 5 nearest aggravated assault points.&#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “d_schools”: the average distance between each home sale price point of our dataset and its 5 nearest colleges or universities.&#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “d_hwy”: the distance for each home sale price point to its nearest major highway in Boston.&#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “WT_SP”: the spatial lag of each home sale price. That is, the average sale price of each home sale price point’s 5 nearest sale prices. (Note: in the OLS regression model we conduct for our analysis, we take a logarithmic transformation of the spatial lag variable of home sale prices, since the raw values of the spatial lag variable are not normally distributed)&#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “NAME10”: the census tract in which each home sale price locates.&#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “Style”: the house style of each home sale price point (i.e.&#38;nbsp;decker, row end, and conventional).

&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;- “LU”: Land use and zoning type for each home sale price point(i.e.R1: single family residence ).&#38;nbsp; &#38;nbsp; 






 



 &#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp; - “GROSS_AREA”: the gross living area of each home sale price point.&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “NUM_FLOORS”: the number of floors for each home sale price point.


- “R_ROOF_TYP”: the roof type of each home sale price point (i.e. F: flat roof, M: Mansard roof).&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “R_EXT_FIN”: the exterior finishing material of each home sale price point(i.e.&#38;nbsp;B: brick, W: wood).&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “R_TOTAL_RM”: the total number of rooms for each home sale price point.&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “R_BDRMS”: the number of bedrooms for each home sale price point.&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “R_FULL_BTH”: the number of full size bathrooms for each home sale price point.&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “R_HALF_BTH”: the number of half size bathrooms for each home sale price point.&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “R_KITCH”: the number of kitchens for each home sale price point.&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “R_HEAT_TYP”: the types of heating systems for each home sale price point(i.e.&#38;nbsp;E: Electric Space Heaters, W: Wood-Burning and Pellet Stoves).&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “R_AC”: the types of air conditioning systems for each home sale price point.&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “YR_BUILT_RC”: the year of each home sale price point was built.&#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp;&#38;nbsp;&#38;nbsp;- “YR_REMOD_RC”: the year of each home sale price point was remodeled.





	&#60;img width="1728" height="1344" width_o="1728" height_o="1344" data-src="https://freight.cargo.site/t/original/i/ddd57233ff9be2905a98fdf185b28bbbdfa29a6bb8c2984d019749a0a0e56e73/correlation.png" data-mid="12100915" border="0" data-scale="72" src="https://freight.cargo.site/w/1000/i/ddd57233ff9be2905a98fdf185b28bbbdfa29a6bb8c2984d019749a0a0e56e73/correlation.png" /&#62;


	

Correlation Matrix


	

 We analyze the pairwise correlations between these predictor variables. This helps us evaluate how inter-correlated these predictors are. The higher the absolute correlation value, the more inter-correlated between the two predictors.
In-Sample Regression
Our in-sample regression result shows that our model explains
 approximately 90% of the variance in the dependent variable home sale prices (R-Squared: 89.5). The Mean Absolute Percentage Error (MAPE), a measure of prediction accuracy of a forecasting method, has a value of around 11% for the in-sample prediction. 



	
&#60;img width="1536" height="960" width_o="1536" height_o="960" data-src="https://freight.cargo.site/t/original/i/6317e9b4d5c27ffb38cc0e93029de0ae22a8e2bcd9bf0e887925e8aae3459705/dependent-variable.png" data-mid="12101336" border="0"  src="https://freight.cargo.site/w/1000/i/6317e9b4d5c27ffb38cc0e93029de0ae22a8e2bcd9bf0e887925e8aae3459705/dependent-variable.png" /&#62;
&#60;img width="1536" height="960" width_o="1536" height_o="960" data-src="https://freight.cargo.site/t/original/i/41074b49154188175fded7304af7ebb478103c743a6cc57bacd6c8239ef47ba2/training.png" data-mid="12101322" border="0"  src="https://freight.cargo.site/w/1000/i/41074b49154188175fded7304af7ebb478103c743a6cc57bacd6c8239ef47ba2/training.png" /&#62;


	

Distribution of Home Prices 
- Original 
- Predicted (training) 
- Predicted (test)


	Out-of-Sample Regression


Since we obviously do not know the future home sale prices in Boston, we need to simulate the out-of-sample usefulness of the model by separating the home sale price data into a training set that we use to train the model and a test set that we use to validate our model. Thus, we randomly select 75% of the home sale price observations from the data as our training set and the other 25% sale price observations is our test set to see how generalizable the model is. 
We first build an OLS regression model of the training set with the same predictors as our in-sample regression model. 

Then we use the training set regression model to predict the home sale prices of the test set. 


The two maps above compares the distribution of original home prices and the predicted training dataset home prices. Visually they display similar spatial patterns.&#38;nbsp;


	&#60;img width="2341" height="251" width_o="2341" height_o="251" data-src="https://freight.cargo.site/t/original/i/f0dce5d413a9ad8a5b77d755e6d2de1d027d615ed3536541c00d3c77cf503cff/rmse-table.PNG" data-mid="12102904" border="0"  src="https://freight.cargo.site/w/1000/i/f0dce5d413a9ad8a5b77d755e6d2de1d027d615ed3536541c00d3c77cf503cff/rmse-table.PNG" /&#62;

	

Summary of Out-of-Sample Regression Result 
- R-squared- Root mean square error(RMSE)- Mean absolute error(MAE)- Mean absolute percent error(MAPE)


	

Moreover, we use the RMSE, MAE and MAPE of the randomly selected test set to see how much our predicted values are different from the observed home sale prices and understand how well our model predict.



	
&#60;img width="1536" height="960" width_o="1536" height_o="960" data-src="https://freight.cargo.site/t/original/i/6bca3f9b7fa8870e89d716a3d60e4c9bc0bd9c7447a4cd501f3cc475c03506ce/test.png" data-mid="12102655" border="0"  src="https://freight.cargo.site/w/1000/i/6bca3f9b7fa8870e89d716a3d60e4c9bc0bd9c7447a4cd501f3cc475c03506ce/test.png" /&#62;
&#60;img width="1152" height="1152" width_o="1152" height_o="1152" data-src="https://freight.cargo.site/t/original/i/721a840dc182aca4161559ea56b125b5f9bb04fb3ab8f5a865cac8148a4f8645/test2.png" data-mid="12103510" border="0"  src="https://freight.cargo.site/w/1000/i/721a840dc182aca4161559ea56b125b5f9bb04fb3ab8f5a865cac8148a4f8645/test2.png" /&#62;
&#60;img width="1152" height="1152" width_o="1152" height_o="1152" data-src="https://freight.cargo.site/t/original/i/b52e470f71c77f73a7987e42711740ddbf791d4f553e760b7c21b26f4398c5e5/test1.png" data-mid="12103163" border="0"  src="https://freight.cargo.site/w/1000/i/b52e470f71c77f73a7987e42711740ddbf791d4f553e760b7c21b26f4398c5e5/test1.png" /&#62;


	

Test Dataset Regression Residuals (Errors)&#38;nbsp;- Spatial distribution

- As a function of observed home price

 
- As a function of predicted home price


	

 

The map on the left above showing the distribution of test dataset regression residuals allows us to further examine if our model sucessfully addresses spatial factors. The test dataset regression residuals are constantly distributed acrross space, revealing that there is little spatial autocorrelation in test dataset regression residuals.


The Moran’s I test results confirm our interpretation of the map. The close-to-1 and slightly negative Moran’s I value (-0.11) indicates little spatial autocorrelation.



We also plot the test dataset regression residuals as a function of the observed and predicted values of home sale price. These plots allow us to examine visually the generalization of the model.&#38;nbsp;

	
&#60;img width="1200" height="731" width_o="1200" height_o="731" data-src="https://freight.cargo.site/t/original/i/415ad205a4bd458b2462b414a3c6189a0a1fa83a3545e460b3b50eb1ae7862bf/classpred.jpg" data-mid="12103816" border="0"  src="https://freight.cargo.site/w/1000/i/415ad205a4bd458b2462b414a3c6189a0a1fa83a3545e460b3b50eb1ae7862bf/classpred.jpg" /&#62;
&#60;img width="1200" height="731" width_o="1200" height_o="731" data-src="https://freight.cargo.site/t/original/i/7301eb2a646b16a4a1f41cfc9b96be5a3c946c4cefc2b0591b7cdb9836ccce64/classpred2.jpg" data-mid="12103817" border="0"  src="https://freight.cargo.site/w/1000/i/7301eb2a646b16a4a1f41cfc9b96be5a3c946c4cefc2b0591b7cdb9836ccce64/classpred2.jpg" /&#62;


	

Class Contest Result- 1st price among 12 groups 
- Cr to Ken Steif (MUSA 507 Instructor + MUSA Director)



	This class project is also a class contest. We are “TeamReclassify”. On the MAPE plot, we rank 2nd place, but in fact after our instructor checked with the first team, they found a predictor not appropriate to use, so we are the one that won the contest in the end. Below is the contest description.&#38;nbsp;
“Winning the contest is all about predictive accuracy. The winning team will be the one that is able to do two things. First, to find the best predictive ‘features’ or variables. Second, pour enough predictive power into the model to predict well without overfitting. You want to create a model that is ‘generalizable’ to the many different neighborhoods in Boston.”
Limitations

There are some limitations of our model. Our model may predict better for rich neighborhoods than for poor neighborhoods due to our selected predictors. Hence, we can add some more predictors which have more explanatory power in poor neighborhoods or in a comprehensive scale in Boston, such as median household income, employment rate and ethnicity factors.




</description>
		
	</item>
		
		
	<item>
		<title>Visualizing Green &#38; Yellow Taxi Trips to Airports in NYC</title>
				
		<link>https://yinuoyin.com/Visualizing-Green-Yellow-Taxi-Trips-to-Airports-in-NYC</link>

		<pubDate>Wed, 12 Sep 2018 20:03:24 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/Visualizing-Green-Yellow-Taxi-Trips-to-Airports-in-NYC</guid>

		<description>




Web Application
Visualizing Green &#38;amp; Yellow Taxi Trips to Airports in NYC


	
		
		
	
	
		
			
				
					





Airport traffic has been booming in NYC. How far in advance should we hail a taxi in order to get to the airport on time?



				
			
		
	



&#60;img width="2500" height="1214" width_o="2500" height_o="1214" data-src="https://freight.cargo.site/t/original/i/9d0388ec645006543148cb786608afeb695f297c17d54c0175ac909d3bdba9b9/1.png" data-mid="24100652" border="0" data-scale="87" src="https://freight.cargo.site/w/1000/i/9d0388ec645006543148cb786608afeb695f297c17d54c0175ac909d3bdba9b9/1.png" /&#62;&#60;img width="2500" height="1044" width_o="2500" height_o="1044" data-src="https://freight.cargo.site/t/original/i/fb09e2d09b757e1a74821abecdd03a118dcdba402053f6df344604faa8a6fbd8/2.png" data-mid="24100653" border="0" data-scale="87" src="https://freight.cargo.site/w/1000/i/fb09e2d09b757e1a74821abecdd03a118dcdba402053f6df344604faa8a6fbd8/2.png" /&#62;



	

Default Page&#38;nbsp;
- Map&#38;nbsp;
- Plot


	Introduction
For all flight passengers, other than flight delays, airport traffic is the second most annoying thing when taking a flight for no matter business trips or personal vacations. In New York City, airport traffic has been booming in recent years thanks to the increasing needs in air transport and the development of aviation system. All local New Yorkers and travelers may want to ask the question that, how far in advance they should hail a taxi in order to get to the airport on time. Therefore, we want to explore NYC taxi trips data available online to help answer this question.



	
&#60;img width="2500" height="1221" width_o="2500" height_o="1221" data-src="https://freight.cargo.site/t/original/i/8e642e3e257a7a88d87f1109af3df6c41a24256694d0074f75bf338d888f3d74/10.png" data-mid="24101320" border="0" data-scale="100" src="https://freight.cargo.site/w/1000/i/8e642e3e257a7a88d87f1109af3df6c41a24256694d0074f75bf338d888f3d74/10.png" /&#62;
&#60;img width="2500" height="1226" width_o="2500" height_o="1226" data-src="https://freight.cargo.site/t/original/i/f31afd1b6bc261f8d377f5eceb31023b6c9641a3d19ebc514e765c04c1d85e49/4.png" data-mid="24100931" border="0"  src="https://freight.cargo.site/w/1000/i/f31afd1b6bc261f8d377f5eceb31023b6c9641a3d19ebc514e765c04c1d85e49/4.png" /&#62;
&#60;img width="2500" height="1232" width_o="2500" height_o="1232" data-src="https://freight.cargo.site/t/original/i/16e238a8f3691978574432a7ccc40d7de1eabbdd03f189dcdbf42b6779077811/11.png" data-mid="24101321" border="0"  src="https://freight.cargo.site/w/1000/i/16e238a8f3691978574432a7ccc40d7de1eabbdd03f189dcdbf42b6779077811/11.png" /&#62;




	

















User
can update the map by selecting 
from JFK and LGA, green taxi and yellow cab,

and Monday through Sunday.



&#38;nbsp;

- Map wil















update
automatically




	

The goal of this project is to build a web-based user interface that shows spatial and temporal pattern in average travel time to JFK and LGA by yellow and green cab, using taxi trips data from January to June 2016.





	


&#60;img width="2500" height="1227" width_o="2500" height_o="1227" data-src="https://freight.cargo.site/t/original/i/9f0be37894a3bf81dc8c2e57d8e57d0ac152322ce6963f01c9eff8ca75ee3cd6/5.png" data-mid="24100932" border="0"  src="https://freight.cargo.site/w/1000/i/9f0be37894a3bf81dc8c2e57d8e57d0ac152322ce6963f01c9eff8ca75ee3cd6/5.png" /&#62;
&#60;img width="2500" height="1213" width_o="2500" height_o="1213" data-src="https://freight.cargo.site/t/original/i/4e76d0d7498ca3491adedac3931016101c043cdc1e73b8eb7f84b9a12da21080/9.png" data-mid="24101286" border="0" data-scale="100" src="https://freight.cargo.site/w/1000/i/4e76d0d7498ca3491adedac3931016101c043cdc1e73b8eb7f84b9a12da21080/9.png" /&#62;


	

More User Interaction&#38;nbsp;

- 















User
can zoom in to a borough by clicking on 
the “Select a Borough” button- User can go back to full map extent by clicking 
on “Back to Full Extent”


	



Data Source
- NYC Taxi &#38;amp; Limousine Commission - NYC Taxi Trips- NYU Spatial Data Repository - 2010 New York Large Public Facilities- NYC Open Data Portal - 2010 Census Tracts





	&#60;img width="2500" height="1245" width_o="2500" height_o="1245" data-src="https://freight.cargo.site/t/original/i/8c11beecd1d214082442f11b336036ea2ea585e3c8e7527531abc72b179ecd8b/3.png" data-mid="24100732" border="0" data-scale="91" src="https://freight.cargo.site/w/1000/i/8c11beecd1d214082442f11b336036ea2ea585e3c8e7527531abc72b179ecd8b/3.png" /&#62;



When Mouse is Over 

- 















The
map will show the average travel time 
in minute of each census tract









MethodThe data are cleaned and processed in R and the web application is created using Shiny package in R. 



	
&#60;img width="2500" height="1079" width_o="2500" height_o="1079" data-src="https://freight.cargo.site/t/original/i/bceb2190393e411cb7f4d9c711c6b89bea76fc13fb00d9c95d716f8a2d62c63a/6.png" data-mid="24100933" border="0" data-scale="100" src="https://freight.cargo.site/w/1000/i/bceb2190393e411cb7f4d9c711c6b89bea76fc13fb00d9c95d716f8a2d62c63a/6.png" /&#62;
&#60;img width="1682" height="881" width_o="1682" height_o="881" data-src="https://freight.cargo.site/t/original/i/a3adf9939766aa0eaab9b814edbd03e1ca8ec4c31abf4b0fa98c46d08d338554/7.png" data-mid="24100934" border="0"  src="https://freight.cargo.site/w/1000/i/a3adf9939766aa0eaab9b814edbd03e1ca8ec4c31abf4b0fa98c46d08d338554/7.png" /&#62;
&#60;img width="1704" height="883" width_o="1704" height_o="883" data-src="https://freight.cargo.site/t/original/i/167e5d7ac1e4e8e873d560aceb1256708969387671e70d13d0862499250e6c20/8.png" data-mid="24101270" border="0"  src="https://freight.cargo.site/w/1000/i/167e5d7ac1e4e8e873d560aceb1256708969387671e70d13d0862499250e6c20/8.png" /&#62;




Plot Tab&#38;nbsp;

- 















The
plot also responds to user selections



- The
plotly plot is more interactive than regular ggplot





ResultsPlease click here to visit the website and explore its features!</description>
		
	</item>
		
		
	<item>
		<title>Dunkin’ Donuts Business Profile in Philadelphia</title>
				
		<link>https://yinuoyin.com/Dunkin-Donuts-Business-Profile-in-Philadelphia</link>

		<pubDate>Wed, 12 Sep 2018 20:44:01 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/Dunkin-Donuts-Business-Profile-in-Philadelphia</guid>

		<description>




Web Application


Dunkin’ Donuts Business Profile in Philadelphia




	
		
		
	
	
		
			
				
					





This web app aims to help user visualize and analyze current stores' locations and business settings and then determine where to locate the next Dunkin' Donut branch store in Philly.



				
			
		
	





&#60;img width="3829" height="1887" width_o="3829" height_o="1887" data-src="https://freight.cargo.site/t/original/i/28b2b29b9503f78adbddd580b9f68aa8d0eb91add5eade532471e2774861c094/a.PNG" data-mid="24101596" border="0" data-scale="86" src="https://freight.cargo.site/w/1000/i/28b2b29b9503f78adbddd580b9f68aa8d0eb91add5eade532471e2774861c094/a.PNG" /&#62;


	Start Page- Sidebar&#38;nbsp;- Map&#38;nbsp;

	Introduction

This web app displays business, demographic and proximity stats of all Dunkin' Donuts stores in Philadelphia, PA. The website allows sufficient user interaction. It aims to help user visualize and analyze current stores' locations and business settings and then determine where to locate the next Dunkin' Donut branch store in Philly.


Method


This web application is written in Java, HTML and CSS. Bootstrap HTML, CSS and JS library is used. For full code, please see my GitHub.&#38;nbsp;




	
&#60;img width="3829" height="1883" width_o="3829" height_o="1883" data-src="https://freight.cargo.site/t/original/i/1c8b090f049acbf45b47e79618efe3bf38e247ea5eb6ac5b843e741953175f60/b.PNG" data-mid="24101701" border="0" data-scale="100" src="https://freight.cargo.site/w/1000/i/1c8b090f049acbf45b47e79618efe3bf38e247ea5eb6ac5b843e741953175f60/b.PNG" /&#62;
&#60;img width="3830" height="1887" width_o="3830" height_o="1887" data-src="https://freight.cargo.site/t/original/i/f4ae60fa5723b9ea5717c738bbe31db6f0fd018e22df268349fffc48a75748b6/c.PNG" data-mid="24101702" border="0"  src="https://freight.cargo.site/w/1000/i/f4ae60fa5723b9ea5717c738bbe31db6f0fd018e22df268349fffc48a75748b6/c.PNG" /&#62;
&#60;img width="3818" height="1892" width_o="3818" height_o="1892" data-src="https://freight.cargo.site/t/original/i/f433e0ddaafb5d8c8a339b442bac823bbae9953a032f535d9c626d56c9711e77/d.PNG" data-mid="24101703" border="0"  src="https://freight.cargo.site/w/1000/i/f433e0ddaafb5d8c8a339b442bac823bbae9953a032f535d9c626d56c9711e77/d.PNG" /&#62;
&#60;img width="3821" height="1889" width_o="3821" height_o="1889" data-src="https://freight.cargo.site/t/original/i/7c647174faa4d06f37b27bd5fab93907e7d448629f41f81ec24cb74ee1fda3a2/e.PNG" data-mid="24101704" border="0"  src="https://freight.cargo.site/w/1000/i/7c647174faa4d06f37b27bd5fab93907e7d448629f41f81ec24cb74ee1fda3a2/e.PNG" /&#62;


	

Web App Functions- See description on the right


	

Web Features


1. Maps&#38;nbsp; &#38;nbsp; - Display 2 maps showing sales volume and number of employees in each branch store

2. Informations&#38;nbsp; &#38;nbsp; - When user clicks on each store, the sidebar shows full info for this store

3. Filter&#38;nbsp; &#38;nbsp; -&#38;nbsp; User can choose to filter the stores by their sales volume and number of employees&#38;nbsp; &#38;nbsp; - Options include “Above Average”, “Below Average” and “Custom Range”

4. Route&#38;nbsp; &#38;nbsp; -&#38;nbsp; Map the route from user location to the selected store



</description>
		
	</item>
		
		
	<item>
		<title>Chicago Crime Risk Terrain Model</title>
				
		<link>https://yinuoyin.com/Chicago-Crime-Risk-Terrain-Model</link>

		<pubDate>Fri, 16 Feb 2018 03:58:17 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/Chicago-Crime-Risk-Terrain-Model</guid>

		<description>




Predictive Modeling


Chicago Crime Risk Terrain Model

The risk terrain modeling (RTM) aims to predict where an assault crime 
will happen in Chicago, IL based on Poisson regression.&#38;nbsp;


	&#60;img width="2550" height="3300" width_o="2550" height_o="3300" data-src="https://freight.cargo.site/t/original/i/7c0119f056cf5b65617ea289935bcfd48529fcfbc1663083e239a6246ea096bb/kernelMap.jpg" data-mid="12173821" border="0" data-scale="50" src="https://freight.cargo.site/w/1000/i/7c0119f056cf5b65617ea289935bcfd48529fcfbc1663083e239a6246ea096bb/kernelMap.jpg" /&#62;


Kernel Density Maps
Showing the density of assault incidents in Chicago in 2014



Risk Terrian Model (RTM) identifies risky locations for crime, provides real-time risk assessments about the context of crime incidents, and 
enhances the situational awareness of law enforcement. 

In this project, I trained a Poisson regression model on 2014 assault incidents in Chicago, created choropleth maps to visualize the risk terrain model and individual predictor variables, visualized the ranking of predictor variables from the most influential to the least based on standardized coefficients, and finally examined the goodness of fit of the RTM by comparing the number of assaults correctly predicted by RTM and a

kernel density map distribution.&#38;nbsp;
	
&#60;img width="838" height="1253" width_o="838" height_o="1253" data-src="https://freight.cargo.site/t/original/i/a6af385c6d1876e98cbc65917b10b007962c5253a1b60d9d8e948d9dbdcb18ee/factor-map.jpg" data-mid="12174098" border="0"  src="https://freight.cargo.site/w/838/i/a6af385c6d1876e98cbc65917b10b007962c5253a1b60d9d8e948d9dbdcb18ee/factor-map.jpg" /&#62;
&#60;img width="1238" height="1320" width_o="1238" height_o="1320" data-src="https://freight.cargo.site/t/original/i/961abc77c756f4729017e450e014850a68c5e9dedb9d874ea22492108bd8dd8f/regression-result.PNG" data-mid="12174155" border="0"  src="https://freight.cargo.site/w/1000/i/961abc77c756f4729017e450e014850a68c5e9dedb9d874ea22492108bd8dd8f/regression-result.PNG" /&#62;


	

Risk Factors &#38;amp; Regression Result
13 Significant Risk Factors


	For the Poisson regression, the dependent variable is&#38;nbsp;assault crime incidents in Chicago in 2014.&#38;nbsp;Predictors in the model (risk factors) include several proximity factors and density measures. 

The model is finalized after several attempts and comparisons based on AIC value. 

The final regression model has 13 significant predictors, which are presented above on the left. 

	&#60;img width="967" height="773" width_o="967" height_o="773" data-src="https://freight.cargo.site/t/original/i/4d8bfd21bf7fa3cab42dcb1112a8c514faca024e4568f4e9871a2d791177fed7/PredMap.png" data-mid="12106270" border="0" data-scale="72" src="https://freight.cargo.site/w/967/i/4d8bfd21bf7fa3cab42dcb1112a8c514faca024e4568f4e9871a2d791177fed7/PredMap.png" /&#62;


RTM Prediction Map


The map above shows the spatial distribution of assault

counts predicted by RTM. 
	
&#60;img width="967" height="773" width_o="967" height_o="773" data-src="https://freight.cargo.site/t/original/i/26ef6b7a173bb6d8ef1037e27a8c97190888ea16d92d61fd9d2bb897c3e54b3e/StandardizedBar.png" data-mid="12106272" border="0"  src="https://freight.cargo.site/w/967/i/26ef6b7a173bb6d8ef1037e27a8c97190888ea16d92d61fd9d2bb897c3e54b3e/StandardizedBar.png" /&#62;
&#60;img width="967" height="773" width_o="967" height_o="773" data-src="https://freight.cargo.site/t/original/i/9ab5b6ab15c9660c9906c023611bf47404667f2c3261a47609d9339f7145192d/GoodnessFit.png" data-mid="12106269" border="0"  src="https://freight.cargo.site/w/967/i/9ab5b6ab15c9660c9906c023611bf47404667f2c3261a47609d9339f7145192d/GoodnessFit.png" /&#62;


	

Model Analysis
- Influence of risk factors 
- Goodness of fit


	

The plot on the left helps visualize which variable brings the most influence
to the prediction model by showing the absolute value of the
coefficient of each varable. W
e can see that the top 5 most influential risk factors are DISTBUS, DISTSCHL, DISTGRCRY, DISTABANB and DISTSTLITE. LdryDens and DISTBARS seem to be no very influential. 


The plot on the right&#38;nbsp;shows the goodness of fit of the model by comparing
the number of assaults correctly predicted by RTM and a
kernel density map distribution. It shows that the RTM does a better job predicting assaults for the top 70% and 90% than top 30% and 50% risk levels. This is what we want because when it comes to policy-making, we care more about high-risk areas with high possibility of assault incidents.&#38;nbsp;

</description>
		
	</item>
		
		
	<item>
		<title>Urban Growth Boundary (UGB)</title>
				
		<link>https://yinuoyin.com/Urban-Growth-Boundary-UGB</link>

		<pubDate>Thu, 01 Mar 2018 17:32:55 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/Urban-Growth-Boundary-UGB</guid>

		<description>


Cartography
Urban Growth Boundary (UGB)



An UGB is a regional boundary that helps control urban sprawl. The area inside the boundary will be used for urban development and the area outside will be preserved in its natural state.




	&#60;img width="3300" height="2550" width_o="3300" height_o="2550" data-src="https://freight.cargo.site/t/original/i/7b6cbb38b855adf3b2090bdbded78fb6da907b25ef975362c12ba3981c6ea023/HW2_Map1_Yin.jpg" data-mid="12176239" border="0" data-scale="75" src="https://freight.cargo.site/w/1000/i/7b6cbb38b855adf3b2090bdbded78fb6da907b25ef975362c12ba3981c6ea023/HW2_Map1_Yin.jpg" /&#62;


UGB Map
Lancaster County, PA



UGB maps can help us identify areas suitable for future urban development. 

This project aims to first identify a township suitable for receiving new development permits, then visualize areas suitable for development in the selected township considering landcover type. The project is based on Lancaster County, PA.&#38;nbsp;

This map above is created in ArcGIS, visualizing which municipalities are in the 0.25-mile buffers inside or outside the UGB in Lancaster County, PA.
	&#60;img width="3300" height="2550" width_o="3300" height_o="2550" data-src="https://freight.cargo.site/t/original/i/219e6c2bd7c3aa376a050c4b5691197c9c09569c70b1e7adbb9e693b7b240df5/HW2_Map2_Yin.jpg" data-mid="12176236" border="0" data-scale="100" src="https://freight.cargo.site/w/1000/i/219e6c2bd7c3aa376a050c4b5691197c9c09569c70b1e7adbb9e693b7b240df5/HW2_Map2_Yin.jpg" /&#62;
	&#60;img width="796" height="448" width_o="796" height_o="448" data-src="https://freight.cargo.site/t/original/i/02abdc621e583b4550100ee052964b250a6cde076305fdf9f08ad0afb6d5fe8b/chart1.png" data-mid="12176932" border="0"  src="https://freight.cargo.site/w/796/i/02abdc621e583b4550100ee052964b250a6cde076305fdf9f08ad0afb6d5fe8b/chart1.png" /&#62;&#60;img width="822" height="449" width_o="822" height_o="449" data-src="https://freight.cargo.site/t/original/i/259ac635862c85fae5d28874eb2f8058ddd62b797f67e61de84405c98bdc0e07/chart2.png" data-mid="12176933" border="0" data-scale="100" src="https://freight.cargo.site/w/822/i/259ac635862c85fae5d28874eb2f8058ddd62b797f67e61de84405c98bdc0e07/chart2.png" /&#62;

	

Landcover and UGB of
West Hempfield Township
- With 2 bar graphs showing the difference of

density of buildings and sum of daily vehicle miles

traveled of areas inside and outside the UGB


	Based on maps I created in ArcGIS and charts I created in Excel, I chose West Hempfield Township as the best township to receive the new development permits based on two variables: density
of buildings and sum of daily vehicle miles traveled. As is shown in the graphs above, for West Hempfield Township, the area inside the UGB has a building density significantly higher than the area outside the UGB. West Hempfield Township not only has the largest inside the UGB building density, compared with other townships, it also has the greatest difference of building density between inside and outside the UGB (1.28). In addition, West Hempfield has the largest values of sum of daily vehicle miles traveled for both inside and outside the UGB, which means that there are high volumes of traffic in town, indicating high potential of future development.

The reason I chose the two variables stated above as better indicators is that, they show greater differences of values between inside and outside the UGB. Also, for these two variables, for most townships, values of inside the UGB is greater than values of outside the UGB, which seems to be more reflective of the characteristics of areas inside and outside the UGB.
</description>
		
	</item>
		
		
	<item>
		<title>Animated Choropleth Map of Philadelphia Housing Tenure</title>
				
		<link>https://yinuoyin.com/Animated-Choropleth-Map-of-Philadelphia-Housing-Tenure</link>

		<pubDate>Thu, 01 Mar 2018 17:53:10 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/Animated-Choropleth-Map-of-Philadelphia-Housing-Tenure</guid>

		<description>


Data Visualization
Animated Choropleth Map of Philadelphia Housing Tenure




















Does
















the housing tenure pattern in Philadelphia change significantly 

from
2010 to 2016 

?&#38;nbsp;


	&#60;img width="814" height="1200" width_o="814" height_o="1200" data-src="https://freight.cargo.site/t/original/i/2828f22e2bc31ef67183e81dc49309119fbf2b527bbef788bb6031499773a6ed/Yin_GIF.gif" data-mid="12178289" border="0" data-scale="70" src="https://freight.cargo.site/w/814/i/2828f22e2bc31ef67183e81dc49309119fbf2b527bbef788bb6031499773a6ed/Yin_GIF.gif" /&#62;


Housing Tenure Pattern Change 2010 - 2016
Philadelphia, PAMETHOD
1. Data Collection
- Download 7 years of Philadelphia tenure data (2010
- 2016) from Census ACS Data Portal.&#38;nbsp; 
- Download 2010 Philadelphia census tract
shapefile from Open Data Philly.&#38;nbsp;



2. Data Cleaning &#38;amp; Processing 
- In Excel, for every CSV file downloaded, delete
empty columns and margins of error values of total, renter-occupied and
owner-occupied housing units.



- In R, import cleaned CSV files and:


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp;  Join
CSV files with Philadelphia 2010 census tract shapefile


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp; Calculate percentage of renter-occupied
housing units


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp; Combine 7 data frames into one tall-format
data frame


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp; Turn all NA values to 0

- Additional modification:


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp; Make sure that the percentage values are
represented in percent format (50 instead of 0.5)


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp; Make sure that there is a “YEAR” column and all
year values are in numeric class



- I also want to create bar charts comparing the
percentage of owner-occupied and renter-occupied housing units in each year:


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp; Create data frames with the percentage values
I need




3. Data Visualization
- Ready to create maps and charts! 


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp; First determine the layout and style of map
and bar chart


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp; Run a for loop to create 7 maps and 7 charts
for each year on one click. 


&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; o&#38;nbsp; Plot map and chart for each year on one page 


 &#38;nbsp; &#38;nbsp;&#38;nbsp; &#38;nbsp; &#38;nbsp;o&#38;nbsp; Finally, create the final animated GIF image!&#38;nbsp;



RESULT
Now we can answer the question we raised before: does the housing tenure pattern in Philadelphia change significantly 

from
2010 to 2016 

?

From what I see from the final GIF
image, there is no significant change in the percentage of
renter-occupied housing units. In other words, the housing tenure pattern in
Philadelphia from 2010 to 2016 does not change too much. However, the percentage of renter-occupied housing units did keep increasing from 2010 to 2016.&#38;nbsp;
</description>
		
	</item>
		
		
	<item>
		<title>Seasonal and Spatial Variation in Burglary Incidents in Philadelphia</title>
				
		<link>https://yinuoyin.com/Seasonal-and-Spatial-Variation-in-Burglary-Incidents-in-Philadelphia</link>

		<pubDate>Thu, 01 Mar 2018 20:17:17 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/Seasonal-and-Spatial-Variation-in-Burglary-Incidents-in-Philadelphia</guid>

		<description>


Data Visualization

















Seasonal
and Spatial Variation in Burglary Incidents in Philadelphia









































According to Bureau of Justice Statistics (BJS),
seasonal patterns are a popular topic on crime, showing how environmental
factors may be related to crime
throughout the year.&#38;nbsp;






INTRO &#38;amp; MOTIVATION
Based on
BJS’s findings from 1993 to 2010’s crime data, seasonal patterns exist for household
property victimization, including burglary and household larceny, with higher
crime rates in the summer and lower rates during other seasons of the year. 
&#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp; &#38;nbsp;&#38;nbsp; After reading BJS’s report, I am
wondering whether the same seasonal patterns also exist for burglary incidents
in Philadelphia, after the year of 2010. Besides, I want to visualize any
spatial patterns at the street segment level in Philly in terms of burglary incidents.
In this project, I will compare the spatial pattern of burglary incidents at
the street segment level in Philadelphia in summer and winter of 2014 to 2016
in the form of maps, and present 3 bar plots comparing the count of burglary
incidents in all four seasons from 2014 to 2016. I only map and compare
burglary incidents in summer and winter because I speculate that incidents in
summer and winter will exhibit the most obvious differences.



&#60;img width="1796" height="1525" width_o="1796" height_o="1525" data-src="https://freight.cargo.site/t/original/i/9128f97567738487dd0bdaeedf0b006b1211a4ba557a9eaca6070f4ff17b2e5d/final_map.jpg" data-mid="12189911" border="0" data-scale="76" src="https://freight.cargo.site/w/1000/i/9128f97567738487dd0bdaeedf0b006b1211a4ba557a9eaca6070f4ff17b2e5d/final_map.jpg" /&#62;

	
	

Seasonal and Spatial Pattern in Burglary IncidentsPhiladelphia, PA, 2014 - 2016


	


	METHOD




















This project is done using R and SQL.&#38;nbsp;

1. Data Collection and Cleaning- Download 10 years of Philadelphia crime data (2006
- 2017), which contains 2,000,000 rows of data. Download Philadelphia street segment shapefile
(street centerline)
- For Philly street segment shapefile, I read
it, project it for future distance calculation, and write it to database. For the csv file containing Philadelphia crime
data, I transform it to a sf object based on longitude and latitude. &#38;nbsp;
- With my
goal of examining 3 years of the most recent crime data, I selected 2014-2016
crime data points from the original dataset. Since I want to examine only burglary
incidents, I filtered the dataset by selecting only “Burglary Non-Residential”
and “Burglary Residential”. At this point I obtained a filtered dataset of all
burglary incidents from 2014 to 2016 in Philadelphia (24,610 rows of data).
- I also added a column to the filtered burglary
dataset: season, based on the time of each burglary incident. The cleaned burglary
dataset was then written to database. 



2. Spatial Queries- To accelerate the process of running spatial
queries, I first create spatial index for street segment dataset and burglary
dataset in the database. 


- I use a spatial query to associate burglary
incidents with street segments based on this criterion: the incident is within
100 m of the street segment (took about 10 minutes). &#38;nbsp;To expedite the process, I in fact ran two
spatial queries to associate burglary incidents in summer and in winter with
street segments (took less than 5 minutes in total). The two generated datasets
 were written to database again. &#38;nbsp; 


- I then use another two spatial queries to count
the number of burglary at each street segment in winter and in summer. 


- Finally, I joined the aggregated datasets with
burglary count with the original street segments dataset for map plotting. All
segments without burglary count were assigned a value of 0.&#38;nbsp;
- Click here for spatial queries code. Note that adding index is an essential step for reducing the processing time of spatial queries.&#38;nbsp;

3. Data Visualization - Put together maps and bar plots in R
	


&#60;img width="1166" height="1053" width_o="1166" height_o="1053" data-src="https://freight.cargo.site/t/original/i/8daffa8228cfa3bfc1398ed53cb4e8baecaa9dbc07f28b62d1659847e74cda21/final_gif.gif" data-mid="12188601" border="0" data-scale="68" src="https://freight.cargo.site/w/1000/i/8daffa8228cfa3bfc1398ed53cb4e8baecaa9dbc07f28b62d1659847e74cda21/final_gif.gif" /&#62;

	
	Animated GIF Visualizing the Difference in 
Spatial Distribution of Burglary Incidents between Summer and WinterPhiladelphia, PA, 2014 - 2016




	
	RESULTS

From the final map we can see
that, indicated by the area covered by lighter color and thicker lines, there
are more burglary incidents in summer than in winter in general. This is
clearly shown in the gif image. For the category “1 or 2”, the spatial
distribution of burglary count is very different in summer and in winter. Many
street segments do not have any burglary incidents in winter but have some in
summer. There are also some streets where burglary incidents occur in winter
but not in summer. For the category”3 or 4”, it is interesting that the
burglary incidents count is kind of consistent between summer and winter. In
other words, the street segments with 3 or 4 burglary incidents in summer tend
to have similar number of burglary incidents in winter. For the category
“&#38;gt;5”, the spatial distribution of burglary count is very different in summer
and in winter again. 



From
the bar plot, we can see that in two of the three bar plots, the “summer” bar
has the highest number, which indicates that more street segments have burglary
incidents in summer than in winter. In the category “3 or 4” though, the “fall”
bar has the highest number. For “winter” bar, it usually has the smallest
number. 



All
in all, we can conclude from the map that seasonal patterns of burglary
incidents do exhibit from 2014 to 2016 in Philadelphia at street segment level.
Overall, there are more burglary incidents in summer than in other seasons, and
there tend to be fewer burglary incidents in winter. However, at each street
segment level, the seasonal pattern may not be obvious or may not exist. The
result of this study could also help notify property owners living in the
street segments with high count of burglary to be more careful about their property
and tell them when they should be more careful about it. For policy makers,
studying seasonal and spatial pattern on burglary incidents could help them
make proper policy in respond to the seasonal factors that may cause people to
conduct burglary crime in a particular season or time. 







	
</description>
		
	</item>
		
		
	<item>
		<title>A Guide for You - Philadelphia Farmers Market</title>
				
		<link>https://yinuoyin.com/A-Guide-for-You-Philadelphia-Farmers-Market</link>

		<pubDate>Wed, 28 Mar 2018 23:52:20 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/A-Guide-for-You-Philadelphia-Farmers-Market</guid>

		<description>


Story Map
A Guide for You - Philadelphia Farmers Markets




















A simple story map telling you all detailed information about farmers markets in Philadelphia!


	&#60;img width="961" height="474" width_o="961" height_o="474" data-src="https://freight.cargo.site/t/original/i/b6481664b44b97dae0adf12755edce1a236b138de7af7f037374ebed4d503f5a/Picture1.png" data-mid="13832875" border="0" data-scale="92" src="https://freight.cargo.site/w/961/i/b6481664b44b97dae0adf12755edce1a236b138de7af7f037374ebed4d503f5a/Picture1.png" /&#62;


Farmers Market Locations
Philadelphia, PAFor this project, I first download farmers market GeoJSON data online,&#38;nbsp; and then apply my skills in JavaScript, HTML and CSS to create the simple web application showing information about farmers market in Philadelphia.&#38;nbsp;
You can access the web story map here.&#38;nbsp;
	&#60;img width="1280" height="630" width_o="1280" height_o="630" data-src="https://freight.cargo.site/t/original/i/40a807535d723df70dfe84368a80add3f9cca3edb0065839e548e24e83f2cbd9/Picture4.png" data-mid="13832878" border="0" data-scale="89" src="https://freight.cargo.site/w/1000/i/40a807535d723df70dfe84368a80add3f9cca3edb0065839e548e24e83f2cbd9/Picture4.png" /&#62;


	

Farmers Market by Neighborhood
Philadelphia, PA


	If you are interested in viewing the raw data I used and the code I wrote to create the story map, please visit my&#38;nbsp;GitHub repository

!
</description>
		
	</item>
		
		
	<item>
		<title>Condo Price Per Square Foot Near Rittenhouse Square</title>
				
		<link>https://yinuoyin.com/Condo-Price-Per-Square-Foot-Near-Rittenhouse-Square</link>

		<pubDate>Thu, 29 Mar 2018 01:25:18 +0000</pubDate>

		<dc:creator>Yinuo Yin</dc:creator>

		<guid isPermaLink="true">https://yinuoyin.com/Condo-Price-Per-Square-Foot-Near-Rittenhouse-Square</guid>

		<description>




Web Scraping



















Condo
Price Per Square Foot Near Rittenhouse Square


























Wondering the price per square foot of condominiums overlooking Rittenhouse Square in the center of Philly?





	



&#60;img width="1280" height="627" width_o="1280" height_o="627" data-src="https://freight.cargo.site/t/original/i/1136a725ae64f97e52108b21ff989a6dbaa5234db349de82bf406f31a438c6b0/Picture2.png" data-mid="13833500" border="0" data-scale="71" src="https://freight.cargo.site/w/1000/i/1136a725ae64f97e52108b21ff989a6dbaa5234db349de82bf406f31a438c6b0/Picture2.png" /&#62;


	

Philadelphia Property Database
Web scraping source


	

In this project, I calculate the price
per square foot of condominiums overlooking Rittenhouse Square by scraping thePhiladelphia Property Database.
STEPS
1. Obtain the full list of
condos and units for web scraping (in total of 621 rows of condos and units)

2. Connect to Selenium Standalone

3. Write code to scrape the needed information
from Philadelphia Property Database


&#38;nbsp; &#38;nbsp; -&#38;nbsp; Information I need: the most recent market
value and improvement area


&#38;nbsp; &#38;nbsp; -&#38;nbsp; Instead of jumping into scraping all needed
info for all units, I started by scraping needed info for the first unit
&#38;nbsp; &#38;nbsp; -&#38;nbsp; I first navigated to the website, and sent
address and unit number to the search bar, then found the correct CSS selector
to retrieve needed info, and finally stored the market value and area info to a
data frame







5. Write the loop to scrape needed
values for all condos units 



&#38;nbsp; &#38;nbsp; -&#38;nbsp; After successfully getting what I need for the
first condo unit, I started writing a loop for scraping info for all condos units




&#38;nbsp; &#38;nbsp; -&#38;nbsp; 

The loop starts with going back to the
original search page, getting address and unit for each new search, then going
through the steps to scrape needed info, and finally store every result to a
data frame




&#38;nbsp; &#38;nbsp; -&#38;nbsp; 

The trick is to include a long-enough pause
after each search to prevent failure of the loop



&#38;nbsp; &#38;nbsp; Below are screenshots of the final results I obtained from web scraping.&#38;nbsp;

	&#60;img width="1474" height="1389" width_o="1474" height_o="1389" data-src="https://freight.cargo.site/t/original/i/5b441decc96839f3207dedb953cbcf4e7842743f53fda4ded416a9a9c6874d76/Capture.PNG" data-mid="13835836" border="0" data-scale="35" src="https://freight.cargo.site/w/1000/i/5b441decc96839f3207dedb953cbcf4e7842743f53fda4ded416a9a9c6874d76/Capture.PNG" /&#62;&#38;nbsp; &#38;nbsp; &#38;nbsp;&#60;img width="1526" height="1395" width_o="1526" height_o="1395" data-src="https://freight.cargo.site/t/original/i/1272bf30d875507f39e6a4cfc071359f6c430a88517a39c2c9427a0977b3db1f/Capture2.PNG" data-mid="13835834" border="0" data-scale="36" src="https://freight.cargo.site/w/1000/i/1272bf30d875507f39e6a4cfc071359f6c430a88517a39c2c9427a0977b3db1f/Capture2.PNG" /&#62;


Web Scraping Results




6. Calculate the average price per sqft for
each condo property&#38;nbsp; &#38;nbsp; -&#38;nbsp; Now I have 605 (out of 621) condos units with
market value and improvement area info, as some of the condos units do not have either market value or area info&#38;nbsp; &#38;nbsp; -&#38;nbsp; I calculate the total price and the total area
in sqft for each condo, and then calculate total price/total area to get the
average price per sqft for each condo. 






7. Geocoding&#38;nbsp; &#38;nbsp; -&#38;nbsp; I use a loop again to geocode each condo using
its address






8. Mapping using ggplot2&#38;nbsp; &#38;nbsp; -&#38;nbsp; I also create a bar plot showing the
calculated price per sqft for each condo from high to low
&#38;nbsp; &#38;nbsp;The final map created is shown below!&#38;nbsp;
	
	&#60;img width="927" height="1296" width_o="927" height_o="1296" data-src="https://freight.cargo.site/t/original/i/77c2c345827f6a5f6d7b35458538b6f36e4807618b36e930c42d1320d3d3ed16/Yin_map_final.jpg" data-mid="13835888" border="0" data-scale="81" src="https://freight.cargo.site/w/927/i/77c2c345827f6a5f6d7b35458538b6f36e4807618b36e930c42d1320d3d3ed16/Yin_map_final.jpg" /&#62;
	

	

Condo Price per SQFT near Rittenhouse Square
Philadelphia, PA


	


















RESULT &#38;amp; ANALYSIS


 From the final map we can see
that, average prices per sqft tend to be similar when the condos are nearby.
For example, for 224-30 W Rittenhouse SQ and 220 W Rittenhouse SQ, the
difference between their average prices per sqft is only about $13. For 1900,
2830 and 1820 Rittenhouse SQ, their average prices per sqft are very close as
well. There is exception, though. Although 1806-18 Rittenhouse SQ is close to
1820 Rittenhouse SQ, its price per sqft is almost $80. Another interesting
pattern is that the condo with lowest price per sqft is located the farthest
from Rittenhouse SQ. &#38;nbsp;







</description>
		
	</item>
		
	</channel>
</rss>