How to make scatter plots similar to the one in the paper "Get me off Your F****** Mailing List"?
Question:
This is a serious question. Please do not take it as a joke.
This is a scatter plot from an infamous paper with the same name, Get me off Your F****** Mailing List by Mazières and Kohle (2005), published in a predatory journal. Some people may know it.
I am seriously interested in recreating the same scatter plot to test a new density-based clustering algorithm without the need of creating all the letters from scratch.
Is there any way to make this process easier? (e.g. a dataset, or a package, or a smart way to recreate the plot)
Answers:
Now that the grid package supports clipping paths, we can do:
library(grid)
library(ggplot2)
tg <- textGrob("Get me offnYour Fuckning MailingnList", x = 0.2,
hjust = 0,
gp = gpar(cex = 6, col = "grey", font = 2))
cg <- pointsGrob(x= runif(15000), y = runif(15000), pch = 3,
gp = gpar(cex = 0.5))
rg <- rectGrob(width = unit(0.5, 'npc'), height = unit(0.1, 'npc'),
gp = gpar(fill = 'red'))
ggplot(data = NULL, aes(x = 100, y = 100)) +
geom_point(col = 'white') +
theme_classic() +
theme(panel.border = element_rect(fill = 'white', linewidth = 1))
pushViewport(viewport(clip = tg))
grid.draw(cg)
If you want to actually generate random points sampled from within text, you could do this using Python’s Numpy and Pillow modules relatively easily.
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw, ImageFont
lines = ["Get me off", "your Fuck", "ing Mailing", "List"]
# get a nice heavy font
font = ImageFont.truetype("fonts/Lato-Black.ttf", size=200)
# calculate width
width = max(font.getbbox(line)[2] for line in lines)
# create BW image containing text
im = Image.new('1', (width, len(lines) * font.size))
draw = ImageDraw.Draw(im)
for i, line in enumerate(lines):
draw.text((0, i * font.size), line, font=font, fill=1)
# sample points
y, x = np.where(np.array(im) > 0)
ii = np.random.randint(len(x), size=sum(map(len, lines)) * 50)
x = x[ii] / im.width
y = y[ii] / im.height
# recreate figure
fig, ax = plt.subplots()
ax.semilogy()
ax.scatter(x, 10**(y*-5), 7**2, linewidths=0.5, marker='+', color='black')
ax.set_xlabel("Your Fucking Mailing List")
ax.set_ylabel("Get me off")
ax.set_title("Get me off Your Fucking Mailing List")
which might produce something like:
The lack of masking makes it more difficult to see the letters, but given you seemed to want points for clustering this might not matter so much.
Using matplotlib and clipping, this doesn’t (unfortunately) handle multi-lines easily:
import matplotlib.pyplot as plt
from matplotlib.textpath import TextPath
from matplotlib.font_manager import FontProperties
from matplotlib.transforms import IdentityTransform
import numpy as np
ax = plt.subplot()
N = 7000
x = np.random.random(size=N)
y = np.random.random(size=N)
ax.scatter(x, y, marker='+', color='k', lw=0.5)
text = 'StackOverflow'
text = TextPath((60, 200), text,
prop=FontProperties(weight='bold', size=55),
)
ax.collections[0].set_clip_path(text, transform=IdentityTransform())
Output:
This is a serious question. Please do not take it as a joke.
This is a scatter plot from an infamous paper with the same name, Get me off Your F****** Mailing List by Mazières and Kohle (2005), published in a predatory journal. Some people may know it.
I am seriously interested in recreating the same scatter plot to test a new density-based clustering algorithm without the need of creating all the letters from scratch.
Is there any way to make this process easier? (e.g. a dataset, or a package, or a smart way to recreate the plot)
Now that the grid package supports clipping paths, we can do:
library(grid)
library(ggplot2)
tg <- textGrob("Get me offnYour Fuckning MailingnList", x = 0.2,
hjust = 0,
gp = gpar(cex = 6, col = "grey", font = 2))
cg <- pointsGrob(x= runif(15000), y = runif(15000), pch = 3,
gp = gpar(cex = 0.5))
rg <- rectGrob(width = unit(0.5, 'npc'), height = unit(0.1, 'npc'),
gp = gpar(fill = 'red'))
ggplot(data = NULL, aes(x = 100, y = 100)) +
geom_point(col = 'white') +
theme_classic() +
theme(panel.border = element_rect(fill = 'white', linewidth = 1))
pushViewport(viewport(clip = tg))
grid.draw(cg)
If you want to actually generate random points sampled from within text, you could do this using Python’s Numpy and Pillow modules relatively easily.
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw, ImageFont
lines = ["Get me off", "your Fuck", "ing Mailing", "List"]
# get a nice heavy font
font = ImageFont.truetype("fonts/Lato-Black.ttf", size=200)
# calculate width
width = max(font.getbbox(line)[2] for line in lines)
# create BW image containing text
im = Image.new('1', (width, len(lines) * font.size))
draw = ImageDraw.Draw(im)
for i, line in enumerate(lines):
draw.text((0, i * font.size), line, font=font, fill=1)
# sample points
y, x = np.where(np.array(im) > 0)
ii = np.random.randint(len(x), size=sum(map(len, lines)) * 50)
x = x[ii] / im.width
y = y[ii] / im.height
# recreate figure
fig, ax = plt.subplots()
ax.semilogy()
ax.scatter(x, 10**(y*-5), 7**2, linewidths=0.5, marker='+', color='black')
ax.set_xlabel("Your Fucking Mailing List")
ax.set_ylabel("Get me off")
ax.set_title("Get me off Your Fucking Mailing List")
which might produce something like:
The lack of masking makes it more difficult to see the letters, but given you seemed to want points for clustering this might not matter so much.
Using matplotlib and clipping, this doesn’t (unfortunately) handle multi-lines easily:
import matplotlib.pyplot as plt
from matplotlib.textpath import TextPath
from matplotlib.font_manager import FontProperties
from matplotlib.transforms import IdentityTransform
import numpy as np
ax = plt.subplot()
N = 7000
x = np.random.random(size=N)
y = np.random.random(size=N)
ax.scatter(x, y, marker='+', color='k', lw=0.5)
text = 'StackOverflow'
text = TextPath((60, 200), text,
prop=FontProperties(weight='bold', size=55),
)
ax.collections[0].set_clip_path(text, transform=IdentityTransform())
Output: