A blog from a juvenile *Geekus biologicus*

My first participation to TidyTuesday… without the tidyverse 🙃

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

df = pd.read_csv("data/haunted_places.csv")
df.head()

	city	country	description	location	state	state_abbrev	longitude	latitude	city_longitude	city_latitude
0	Ada	United States	Ada witch - Sometimes you can see a misty blue...	Ada Cemetery	Michigan	MI	-85.504893	42.962106	-85.495480	42.960727
1	Addison	United States	A little girl was killed suddenly while waitin...	North Adams Rd.	Michigan	MI	-84.381843	41.971425	-84.347168	41.986434
2	Adrian	United States	If you take Gorman Rd. west towards Sand Creek...	Ghost Trestle	Michigan	MI	-84.035656	41.904538	-84.037166	41.897547
3	Adrian	United States	In the 1970's, one room, room 211, in the old ...	Siena Heights University	Michigan	MI	-84.017565	41.905712	-84.037166	41.897547
4	Albion	United States	Kappa Delta Sorority - The Kappa Delta Sororit...	Albion College	Michigan	MI	-84.745177	42.244006	-84.753030	42.243097

df.city.value_counts()

city
Los Angeles             61
San Antonio             55
Honolulu                43
Pittsburgh              42
Columbus                41
                        ..
Nyssa                    1
Oregon city              1
In California Page 1     1
Redmond                  1
Roseburg                 1
Name: count, Length: 4385, dtype: int64

Some cities seems to have higher number of haunted places. Is it because they are more populated?

# Mutate description into a category
location_categories = ["university", "school", "bridge", "hotel", "cemetery", "house", "park"]

# Use str.match to check if the description contains any of the categories
location_masks = [df.description.str.match(category, case=False) for category in location_categories]
df["place"] = df.apply(
    lambda item: next(
        (category for category, mask in zip(location_categories, location_masks) if mask[item.name]), "unknown"
    ), axis=1
)
df.place.replace({"university": "school"}, inplace=True)
df.place.unique()

/tmp/ipykernel_56676/3274654065.py:11: FutureWarning:

A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

array(['unknown', 'hotel', 'house', 'cemetery', 'bridge', 'park',
       'school'], dtype=object)

Let’s plot the position of the haunted places on the US.

fig = px.scatter_mapbox(
    df,
    lat="latitude",
    lon="longitude",
    hover_name="location",
    hover_data=["country", "city"],
    color="place",
    zoom=3,
    height=300,
)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.update_traces(marker=dict(size=4))
fig.show()

Haunted places in the US