Haunted Places

tidytuesday
Published

Tuesday, the 10 of October, 2023

My first participation to TidyTuesday… without the tidyverse 🙃

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
df = pd.read_csv("data/haunted_places.csv")
df.head()
city country description location state state_abbrev longitude latitude city_longitude city_latitude
0 Ada United States Ada witch - Sometimes you can see a misty blue... Ada Cemetery Michigan MI -85.504893 42.962106 -85.495480 42.960727
1 Addison United States A little girl was killed suddenly while waitin... North Adams Rd. Michigan MI -84.381843 41.971425 -84.347168 41.986434
2 Adrian United States If you take Gorman Rd. west towards Sand Creek... Ghost Trestle Michigan MI -84.035656 41.904538 -84.037166 41.897547
3 Adrian United States In the 1970's, one room, room 211, in the old ... Siena Heights University Michigan MI -84.017565 41.905712 -84.037166 41.897547
4 Albion United States Kappa Delta Sorority - The Kappa Delta Sororit... Albion College Michigan MI -84.745177 42.244006 -84.753030 42.243097
df.city.value_counts()
city
Los Angeles             61
San Antonio             55
Honolulu                43
Pittsburgh              42
Columbus                41
                        ..
Nyssa                    1
Oregon city              1
In California Page 1     1
Redmond                  1
Roseburg                 1
Name: count, Length: 4385, dtype: int64

Some cities seems to have higher number of haunted places. Is it because they are more populated?

# Mutate description into a category
location_categories = ["university", "school", "bridge", "hotel", "cemetery", "house", "park"]

# Use str.match to check if the description contains any of the categories
location_masks = [df.description.str.match(category, case=False) for category in location_categories]
df["place"] = df.apply(
    lambda item: next(
        (category for category, mask in zip(location_categories, location_masks) if mask[item.name]), "unknown"
    ), axis=1
)
df.place.replace({"university": "school"}, inplace=True)
df.place.unique()
/tmp/ipykernel_56676/3274654065.py:11: FutureWarning:

A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


array(['unknown', 'hotel', 'house', 'cemetery', 'bridge', 'park',
       'school'], dtype=object)

Let’s plot the position of the haunted places on the US.

fig = px.scatter_mapbox(
    df,
    lat="latitude",
    lon="longitude",
    hover_name="location",
    hover_data=["country", "city"],
    color="place",
    zoom=3,
    height=300,
)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.update_traces(marker=dict(size=4))
fig.show()

Haunted places in the US