import pandas as pd
import plotly.express as px
import plotly.graph_objects as goMy first participation to TidyTuesday… without the tidyverse 🙃
df = pd.read_csv("data/haunted_places.csv")
df.head()| city | country | description | location | state | state_abbrev | longitude | latitude | city_longitude | city_latitude | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Ada | United States | Ada witch - Sometimes you can see a misty blue... | Ada Cemetery | Michigan | MI | -85.504893 | 42.962106 | -85.495480 | 42.960727 |
| 1 | Addison | United States | A little girl was killed suddenly while waitin... | North Adams Rd. | Michigan | MI | -84.381843 | 41.971425 | -84.347168 | 41.986434 |
| 2 | Adrian | United States | If you take Gorman Rd. west towards Sand Creek... | Ghost Trestle | Michigan | MI | -84.035656 | 41.904538 | -84.037166 | 41.897547 |
| 3 | Adrian | United States | In the 1970's, one room, room 211, in the old ... | Siena Heights University | Michigan | MI | -84.017565 | 41.905712 | -84.037166 | 41.897547 |
| 4 | Albion | United States | Kappa Delta Sorority - The Kappa Delta Sororit... | Albion College | Michigan | MI | -84.745177 | 42.244006 | -84.753030 | 42.243097 |
df.city.value_counts()city
Los Angeles 61
San Antonio 55
Honolulu 43
Pittsburgh 42
Columbus 41
..
Nyssa 1
Oregon city 1
In California Page 1 1
Redmond 1
Roseburg 1
Name: count, Length: 4385, dtype: int64
Some cities seems to have higher number of haunted places. Is it because they are more populated?
# Mutate description into a category
location_categories = ["university", "school", "bridge", "hotel", "cemetery", "house", "park"]
# Use str.match to check if the description contains any of the categories
location_masks = [df.description.str.match(category, case=False) for category in location_categories]
df["place"] = df.apply(
lambda item: next(
(category for category, mask in zip(location_categories, location_masks) if mask[item.name]), "unknown"
), axis=1
)
df.place.replace({"university": "school"}, inplace=True)
df.place.unique()/tmp/ipykernel_56676/3274654065.py:11: FutureWarning:
A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
array(['unknown', 'hotel', 'house', 'cemetery', 'bridge', 'park',
'school'], dtype=object)
Let’s plot the position of the haunted places on the US.
fig = px.scatter_mapbox(
df,
lat="latitude",
lon="longitude",
hover_name="location",
hover_data=["country", "city"],
color="place",
zoom=3,
height=300,
)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.update_traces(marker=dict(size=4))
fig.show()Haunted places in the US