How to extract all the emojis from text?
Question:
Consider the following list:
a_list = [' me así, bla es se ds ']
How can I extract in a new list all the emojis inside a_list
?:
new_lis = [' ']
I tried to use regex, but I do not have all the possible emojis encodings.
Answers:
All the Unicode emojis with their respective code points are here. They are 1F600 to 1F64F, so you can just build all of them with a range-like iterator.
You can use the emoji
library. You can check if a single codepoint is an emoji codepoint by checking if it is contained in emoji.UNICODE_EMOJI
.
import emoji
def extract_emojis(s):
return ''.join(c for c in s if c in emoji.UNICODE_EMOJI['en'])
If you don’t want to use an external library, as a pythonic way you can simply use regular expressions and re.findall()
with a proper regex to find the emojies:
In [74]: import re
In [75]: re.findall(r'[^ws,]', a_list[0])
Out[75]: [' ', ' ', ' ', ' ', ' ', ' ']
The regular expression r'[^ws,]'
is a negated character class that matches any character that is not a word character, whitespace or comma.
As I mentioned in comment, a text is generally contain word characters and punctuation which will be easily dealt with by this approach, for other cases you can just add them to the character class manually. Note that since you can specify a range of characters in character class you can even make it shorter and more flexible.
Another solution is instead of a negated character class that excludes the non-emoji characters use a character class that accepts emojies ([]
without ^
). Since there are a lot of emojis with different unicode values, you just need to add the ranges to the character class. If you want to match more emojies here is a good reference contain all the standard emojies with the respective range for different emojies http://apps.timwhitlock.info/emoji/tables/unicode:
The top rated answer does not always work. For example flag emojis will not be found. Consider the string:
s = u'Hello U0001f1f7U0001f1fa hello'
What would work better is
import emoji
emojis_list = map(lambda x: ''.join(x.split()), emoji.UNICODE_EMOJI.keys())
r = re.compile('|'.join(re.escape(p) for p in emojis_list))
print(' '.join(r.findall(s)))
The solution to get exactly what tumbleweed ask, is a mix between the top rated answer and user594836’s answer. This is the code that works for me in Python 3.6.
import emoji
import re
test_list=[' me así,bla es,se ds ']
## Create the function to extract the emojis
def extract_emojis(a_list):
emojis_list = map(lambda x: ''.join(x.split()), emoji.UNICODE_EMOJI.keys())
r = re.compile('|'.join(re.escape(p) for p in emojis_list))
aux=[' '.join(r.findall(s)) for s in a_list]
return(aux)
## Execute the function
extract_emojis(test_list)
## the output
[' ']
I think it’s important to point out that the previous answers won’t work with emojis like , because it consists of 4 emojis, and using ... in emoji.UNICODE_EMOJI
will return 4 different emojis. Same for emojis with skin color like .
My solution
Include the emoji
and regex
modules. The regex module supports recognizing grapheme clusters (sequences of Unicode codepoints rendered as a single character), so we can count emojis like
import emoji
import regex
def split_count(text):
emoji_list = []
data = regex.findall(r'X', text)
for word in data:
if any(char in emoji.UNICODE_EMOJI['en'] for char in word):
emoji_list.append(word)
return emoji_list
Testing
with more emojis with skin color:
line = [" me así, se ds hello emoji hello how are you today "]
counter = split_count(line[0])
print(' '.join(emoji for emoji in counter))
output:
Include flags
If you want to include flags, like the Unicode range would be from to , so add:
flags = regex.findall(u'[U0001F1E6-U0001F1FF]', text)
to the function above, and return emoji_list + flags
.
See this answer to "A python regex that matches the regional indicator character class" for more information about the flags.
For newer emoji
versions
to work with emoji >= v1.2.0 you have to add a language specifier (e.g. en
as in above code):
emoji.UNICODE_EMOJI['en']
Step 1: Make sure that your text it’s decoded on utf-8 text.decode('utf-8')
Step 2: Locate all emoji from your text, you must separate the text character by character [str for str in decode]
Step 3: Saves all emoji in a list [c for c in allchars if c in emoji.UNICODE_EMOJI]
full example bellow:
>>> import emoji
>>> text = " me así, bla es se ds "
>>> decode = text.decode('utf-8')
>>> allchars = [str for str in decode]
>>> list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
>>> print list
[u'U0001f914', u'U0001f648', u'U0001f60c', u'U0001f495', u'U0001f46d', u'U0001f459']
if you want to remove from text
>>> filtred = [str for str in decode.split() if not any(i in str for i in list)]
>>> clean_text = ' '.join(filtred)
>>> print clean_text
me así, bla es se ds
Ok – i had this same problem and I worked out a solution which doesn’t require you to import any libraries (like emoji or re) and is a single line of code. It will return all the emojis in the string:
def extract_emojis(sentence):
return [word for word in sentence.split() if str(word.encode('unicode-escape'))[2] == '\' ]
This allowed me to create a light-weight solution and i hope it helps you all. Actually – i needed one which would filter out any emojis in a string – and thats the same as the code above but with one minor change:
def filter_emojis(sentence):
return [word for word in sentence.split() if str(word.encode('unicode-escape'))[2] != '\' ]
Here is an example of it in action:
>>> a = ' me así, bla es se ds '
>>> b = extract_emojis(a)
>>> b
[' ', ' ', ' ', ' ']
from emoji import *
EMOJI_SET = set()
# populate EMOJI_DICT
def pop_emoji_dict():
for emoji in UNICODE_EMOJI:
EMOJI_SET.add(emoji)
# check if emoji
def is_emoji(s):
for letter in s:
if letter in EMOJI_SET:
return True
return False
This is a better solution when working with large datasets since you dont have to loop through all emojis each time. Found this to give me better results 🙂
This function expects a string so converting the list of input to string
a_list = ' me así, bla es se ds '
# Import the necessary modules
from nltk.tokenize import regexp_tokenize
# Tokenize and print only emoji
emoji = "['U0001F300-U0001F5FF'|'U0001F600-U0001F64F'|'U0001F680-
U0001F6FF'|'u2600-u26FFu2700-u27BF']"
print(regexp_tokenize(a_list, emoji))
output :[' ', ' ', ' ', ' ', ' ']
Another way to do it using emoji is to use emoji.demojize
and convert them into text representations of emojis.
Ex: will be converted to :grinning_face:
etc..
Then find all :.*:
patterns, and use emoji.emojize
on those.
# -*- coding: utf-8 -*-
import emoji
import re
text = """
Of course, too many emoji characters
like , #@^!*&#@^# helps people read aa aaa a #douchebag
"""
text = emoji.demojize(text)
text = re.findall(r'(:[^:]*:)', text)
list_emoji = [emoji.emojize(x) for x in text]
print(list_emoji)
This might be a redundant way but it’s an example of how emoji.emojize
and emoji.demojize
can be used.
First of all you need to install this:
conda install -c conda-forge emoji
Now we can write the following code:
import emoji
import re
text= ' me así, bla es se ds '
text_de= emoji.demojize(text)
If we print text_de Output is:
':thinking_face: :see-no-evil_monkey: me así, bla es se :relieved_face: ds
:two_hearts::two_women_holding_hands::bikini:'
Now we can use regex to find emojis.
emojis_list_de= re.findall(r'(:[!_-w]+:)', text_de)
list_emoji= [emoji.emojize(x) for x in emojis_list_de]
If we print lis_emoji, output:
[' ', ' ', ' ', ' ', ' ', ' ']
So, we can use Join function:
[''.join(list_emoji)]
OutPut: [' ']
If you want to remove emojis you can use following code:
def remove_emoji(text):
'''
remove all of emojis from text
-------------------------
'''
text= emoji.demojize(text)
text= re.sub(r'(:[!_-w]+:)', '', text)
return text
import emojis
new_list = emojis.get(' me así, bla es se ds ')
print(new_list)
output>>>{' ', ' ', ' ', ' ', ' ', ' '}
Here’s another option that uses emoji.get_emoji_regexp()
and re
:
import re
import emoji
# This works for `emoji` version <2.0
def extract_emojis(text):
return re.findall(emoji.get_emoji_regexp(), text)
test_str = ' some various emojis and flags '
emojis = extract_emojis(test_str)
This yields:
[' ', ' ', ' ', ' u200d ', ' ', ' u200d u200d u200d ']
Or, to view the grapheme clusters:
print(' '.join(emoji for emoji in emojis))
Yields
Newer emoji
versions
For versions of emoji>=2.0.0
, there’s no need for re
:
def extract_emojis(text):
return [x.chars for x in emoji.analyze(test_str)]
If a library seems like overkill, try this regular expression – it works by matching the longest emojis first in a big alternation. Parses all emojis, all skin tones, and all flags. (v14.0) more info
# coding=utf8
import re
a_list = [' me así, bla es se ds ']
ret = re.findall(r'(?: ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | | | | | | | | | | | ❤️ | ❤️ | ❤️ | | | | | | | | | | | | | ️ ️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| | | | | | | | | | | | | | | | | | | | | ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ️| ️♂️| ️♀️| ️♂️| ️♀️| ️♂️| ️♀️| ️ | ️⚧️|⛹ ♂️|⛹ ♂️|⛹ ♂️|⛹ ♂️|⛹ ♂️|⛹ ♀️|⛹ ♀️|⛹ ♀️|⛹ ♀️|⛹ ♀️| | |❤️ |❤️ | ♂️| ♀️| | | | | | | | | | | | | ♀️| ♂️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ⚕️| ⚕️| ⚕️| | | | | | | ⚖️| ⚖️| ⚖️| | | | | | | | | | | | | | | | | | | | | | | | | | | | ✈️| ✈️| ✈️| | | | | | | ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| | | | | ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| | | | | | | | | | ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️|⛹️♂️|⛹️♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| | | | | | ❄️| ☠️| ⬛| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |#️⃣|0️⃣|1️⃣|2️⃣|3️⃣|4️⃣|5️⃣|6️⃣|7️⃣|8️⃣|9️⃣|✋ |✋ |✋ |✋ |✋ |✌ |✌ |✌ |✌ |✌ |☝ |☝ |☝ |☝ |☝ |✊ |✊ |✊ |✊ |✊ |✍ |✍ |✍ |✍ |✍ |⛹ |⛹ |⛹ |⛹ |⛹ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |☺|☹|☠|❣|❤|✋|✌|☝|✊|✍|⛷|⛹|☘|☕|⛰|⛪|⛩|⛲|⛺|♨|⛽|⚓|⛵|⛴|✈|⌛|⏳|⌚|⏰|⏱|⏲|☀|⭐|☁|⛅|⛈|☂|☔|⛱|⚡|❄|☃|⛄|☄|✨|⚽|⚾|⛳|⛸|♠|♥|♦|♣|♟|⛑|☎|⌨|✉|✏|✒|✂|⛏|⚒|⚔|⚙|⚖|⛓|⚗|⚰|⚱|♿|⚠|⛔|☢|☣|⬆|↗|➡|↘|⬇|↙|⬅|↖|↕|↔|↩|↪|⤴|⤵|⚛|✡|☸|☯|✝|☦|☪|☮|♈|♉|♊|♋|♌|♍|♎|♏|♐|♑|♒|♓|⛎|▶|⏩|⏭|⏯|◀|⏪|⏮|⏫|⏬|⏸|⏹|⏺|⏏|♀|♂|⚧|✖|➕|➖|➗|♾|‼|⁉|❓|❔|❕|❗|〰|⚕|♻|⚜|⭕|✅|☑|✔|❌|❎|➰|➿|〽|✳|✴|❇|©|®|™|ℹ|Ⓜ|㊗|㊙|⚫|⚪|⬛|⬜|◼|◻|◾|◽|▪|▫)', a_list[0])
print(ret)
#[' ', ' ', ' ', ' ', ' ', ' ']
Building on Mohammed Terry Jack answer which only works where each emoji is separated by a space. See a modified version below which has removed this requirement:
def extract_emojis(sentence):
return [sentence[i] for i in range(len(sentence)) if str(sentence[i].encode('unicode-escape'))[2] == '\' ]
Expected result:
>>> a = ' me así, bla es se ds '
>>> b = extract_emojis(a)
>>> b
[' ', ' ', ' ', ' ', ' ', ' ']
Consider the following list:
a_list = [' me así, bla es se ds ']
How can I extract in a new list all the emojis inside a_list
?:
new_lis = [' ']
I tried to use regex, but I do not have all the possible emojis encodings.
All the Unicode emojis with their respective code points are here. They are 1F600 to 1F64F, so you can just build all of them with a range-like iterator.
You can use the emoji
library. You can check if a single codepoint is an emoji codepoint by checking if it is contained in emoji.UNICODE_EMOJI
.
import emoji
def extract_emojis(s):
return ''.join(c for c in s if c in emoji.UNICODE_EMOJI['en'])
If you don’t want to use an external library, as a pythonic way you can simply use regular expressions and re.findall()
with a proper regex to find the emojies:
In [74]: import re
In [75]: re.findall(r'[^ws,]', a_list[0])
Out[75]: [' ', ' ', ' ', ' ', ' ', ' ']
The regular expression r'[^ws,]'
is a negated character class that matches any character that is not a word character, whitespace or comma.
As I mentioned in comment, a text is generally contain word characters and punctuation which will be easily dealt with by this approach, for other cases you can just add them to the character class manually. Note that since you can specify a range of characters in character class you can even make it shorter and more flexible.
Another solution is instead of a negated character class that excludes the non-emoji characters use a character class that accepts emojies ([]
without ^
). Since there are a lot of emojis with different unicode values, you just need to add the ranges to the character class. If you want to match more emojies here is a good reference contain all the standard emojies with the respective range for different emojies http://apps.timwhitlock.info/emoji/tables/unicode:
The top rated answer does not always work. For example flag emojis will not be found. Consider the string:
s = u'Hello U0001f1f7U0001f1fa hello'
What would work better is
import emoji
emojis_list = map(lambda x: ''.join(x.split()), emoji.UNICODE_EMOJI.keys())
r = re.compile('|'.join(re.escape(p) for p in emojis_list))
print(' '.join(r.findall(s)))
The solution to get exactly what tumbleweed ask, is a mix between the top rated answer and user594836’s answer. This is the code that works for me in Python 3.6.
import emoji
import re
test_list=[' me así,bla es,se ds ']
## Create the function to extract the emojis
def extract_emojis(a_list):
emojis_list = map(lambda x: ''.join(x.split()), emoji.UNICODE_EMOJI.keys())
r = re.compile('|'.join(re.escape(p) for p in emojis_list))
aux=[' '.join(r.findall(s)) for s in a_list]
return(aux)
## Execute the function
extract_emojis(test_list)
## the output
[' ']
I think it’s important to point out that the previous answers won’t work with emojis like , because it consists of 4 emojis, and using ... in emoji.UNICODE_EMOJI
will return 4 different emojis. Same for emojis with skin color like .
My solution
Include the emoji
and regex
modules. The regex module supports recognizing grapheme clusters (sequences of Unicode codepoints rendered as a single character), so we can count emojis like
import emoji
import regex
def split_count(text):
emoji_list = []
data = regex.findall(r'X', text)
for word in data:
if any(char in emoji.UNICODE_EMOJI['en'] for char in word):
emoji_list.append(word)
return emoji_list
Testing
with more emojis with skin color:
line = [" me así, se ds hello emoji hello how are you today "]
counter = split_count(line[0])
print(' '.join(emoji for emoji in counter))
output:
Include flags
If you want to include flags, like the Unicode range would be from to , so add:
flags = regex.findall(u'[U0001F1E6-U0001F1FF]', text)
to the function above, and return emoji_list + flags
.
See this answer to "A python regex that matches the regional indicator character class" for more information about the flags.
For newer emoji
versions
to work with emoji >= v1.2.0 you have to add a language specifier (e.g. en
as in above code):
emoji.UNICODE_EMOJI['en']
Step 1: Make sure that your text it’s decoded on utf-8 text.decode('utf-8')
Step 2: Locate all emoji from your text, you must separate the text character by character [str for str in decode]
Step 3: Saves all emoji in a list [c for c in allchars if c in emoji.UNICODE_EMOJI]
full example bellow:
>>> import emoji
>>> text = " me así, bla es se ds "
>>> decode = text.decode('utf-8')
>>> allchars = [str for str in decode]
>>> list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
>>> print list
[u'U0001f914', u'U0001f648', u'U0001f60c', u'U0001f495', u'U0001f46d', u'U0001f459']
if you want to remove from text
>>> filtred = [str for str in decode.split() if not any(i in str for i in list)]
>>> clean_text = ' '.join(filtred)
>>> print clean_text
me así, bla es se ds
Ok – i had this same problem and I worked out a solution which doesn’t require you to import any libraries (like emoji or re) and is a single line of code. It will return all the emojis in the string:
def extract_emojis(sentence):
return [word for word in sentence.split() if str(word.encode('unicode-escape'))[2] == '\' ]
This allowed me to create a light-weight solution and i hope it helps you all. Actually – i needed one which would filter out any emojis in a string – and thats the same as the code above but with one minor change:
def filter_emojis(sentence):
return [word for word in sentence.split() if str(word.encode('unicode-escape'))[2] != '\' ]
Here is an example of it in action:
>>> a = ' me así, bla es se ds '
>>> b = extract_emojis(a)
>>> b
[' ', ' ', ' ', ' ']
from emoji import *
EMOJI_SET = set()
# populate EMOJI_DICT
def pop_emoji_dict():
for emoji in UNICODE_EMOJI:
EMOJI_SET.add(emoji)
# check if emoji
def is_emoji(s):
for letter in s:
if letter in EMOJI_SET:
return True
return False
This is a better solution when working with large datasets since you dont have to loop through all emojis each time. Found this to give me better results 🙂
This function expects a string so converting the list of input to string
a_list = ' me así, bla es se ds '
# Import the necessary modules
from nltk.tokenize import regexp_tokenize
# Tokenize and print only emoji
emoji = "['U0001F300-U0001F5FF'|'U0001F600-U0001F64F'|'U0001F680-
U0001F6FF'|'u2600-u26FFu2700-u27BF']"
print(regexp_tokenize(a_list, emoji))
output :[' ', ' ', ' ', ' ', ' ']
Another way to do it using emoji is to use emoji.demojize
and convert them into text representations of emojis.
Ex: will be converted to :grinning_face:
etc..
Then find all :.*:
patterns, and use emoji.emojize
on those.
# -*- coding: utf-8 -*-
import emoji
import re
text = """
Of course, too many emoji characters
like , #@^!*&#@^# helps people read aa aaa a #douchebag
"""
text = emoji.demojize(text)
text = re.findall(r'(:[^:]*:)', text)
list_emoji = [emoji.emojize(x) for x in text]
print(list_emoji)
This might be a redundant way but it’s an example of how emoji.emojize
and emoji.demojize
can be used.
First of all you need to install this:
conda install -c conda-forge emoji
Now we can write the following code:
import emoji
import re
text= ' me así, bla es se ds '
text_de= emoji.demojize(text)
If we print text_de Output is:
':thinking_face: :see-no-evil_monkey: me así, bla es se :relieved_face: ds
:two_hearts::two_women_holding_hands::bikini:'
Now we can use regex to find emojis.
emojis_list_de= re.findall(r'(:[!_-w]+:)', text_de)
list_emoji= [emoji.emojize(x) for x in emojis_list_de]
If we print lis_emoji, output:
[' ', ' ', ' ', ' ', ' ', ' ']
So, we can use Join function:
[''.join(list_emoji)]
OutPut: [' ']
If you want to remove emojis you can use following code:
def remove_emoji(text):
'''
remove all of emojis from text
-------------------------
'''
text= emoji.demojize(text)
text= re.sub(r'(:[!_-w]+:)', '', text)
return text
import emojis
new_list = emojis.get(' me así, bla es se ds ')
print(new_list)
output>>>{' ', ' ', ' ', ' ', ' ', ' '}
Here’s another option that uses emoji.get_emoji_regexp()
and re
:
import re
import emoji
# This works for `emoji` version <2.0
def extract_emojis(text):
return re.findall(emoji.get_emoji_regexp(), text)
test_str = ' some various emojis and flags '
emojis = extract_emojis(test_str)
This yields:
[' ', ' ', ' ', ' u200d ', ' ', ' u200d u200d u200d ']
Or, to view the grapheme clusters:
print(' '.join(emoji for emoji in emojis))
Yields
Newer emoji
versions
For versions of emoji>=2.0.0
, there’s no need for re
:
def extract_emojis(text):
return [x.chars for x in emoji.analyze(test_str)]
If a library seems like overkill, try this regular expression – it works by matching the longest emojis first in a big alternation. Parses all emojis, all skin tones, and all flags. (v14.0) more info
# coding=utf8
import re
a_list = [' me así, bla es se ds ']
ret = re.findall(r'(?: ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | ❤️ | | | | | | | | | | | ❤️ | ❤️ | ❤️ | | | | | | | | | | | | | ️ ️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| ⚕️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| ⚖️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| ✈️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| | | | | | | | | | | | | | | | | | | | | ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ♂️| ♂️| ♂️| ♂️| ♂️| ♀️| ♀️| ♀️| ♀️| ♀️| ️| ️♂️| ️♀️| ️♂️| ️♀️| ️♂️| ️♀️| ️ | ️⚧️|⛹ ♂️|⛹ ♂️|⛹ ♂️|⛹ ♂️|⛹ ♂️|⛹ ♀️|⛹ ♀️|⛹ ♀️|⛹ ♀️|⛹ ♀️| | |❤️ |❤️ | ♂️| ♀️| | | | | | | | | | | | | ♀️| ♂️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ⚕️| ⚕️| ⚕️| | | | | | | ⚖️| ⚖️| ⚖️| | | | | | | | | | | | | | | | | | | | | | | | | | | | ✈️| ✈️| ✈️| | | | | | | ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| | | | | ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| | | | | | | | | | ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️|⛹️♂️|⛹️♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| ♂️| ♀️| | | | | | ❄️| ☠️| ⬛| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |#️⃣|0️⃣|1️⃣|2️⃣|3️⃣|4️⃣|5️⃣|6️⃣|7️⃣|8️⃣|9️⃣|✋ |✋ |✋ |✋ |✋ |✌ |✌ |✌ |✌ |✌ |☝ |☝ |☝ |☝ |☝ |✊ |✊ |✊ |✊ |✊ |✍ |✍ |✍ |✍ |✍ |⛹ |⛹ |⛹ |⛹ |⛹ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |☺|☹|☠|❣|❤|✋|✌|☝|✊|✍|⛷|⛹|☘|☕|⛰|⛪|⛩|⛲|⛺|♨|⛽|⚓|⛵|⛴|✈|⌛|⏳|⌚|⏰|⏱|⏲|☀|⭐|☁|⛅|⛈|☂|☔|⛱|⚡|❄|☃|⛄|☄|✨|⚽|⚾|⛳|⛸|♠|♥|♦|♣|♟|⛑|☎|⌨|✉|✏|✒|✂|⛏|⚒|⚔|⚙|⚖|⛓|⚗|⚰|⚱|♿|⚠|⛔|☢|☣|⬆|↗|➡|↘|⬇|↙|⬅|↖|↕|↔|↩|↪|⤴|⤵|⚛|✡|☸|☯|✝|☦|☪|☮|♈|♉|♊|♋|♌|♍|♎|♏|♐|♑|♒|♓|⛎|▶|⏩|⏭|⏯|◀|⏪|⏮|⏫|⏬|⏸|⏹|⏺|⏏|♀|♂|⚧|✖|➕|➖|➗|♾|‼|⁉|❓|❔|❕|❗|〰|⚕|♻|⚜|⭕|✅|☑|✔|❌|❎|➰|➿|〽|✳|✴|❇|©|®|™|ℹ|Ⓜ|㊗|㊙|⚫|⚪|⬛|⬜|◼|◻|◾|◽|▪|▫)', a_list[0])
print(ret)
#[' ', ' ', ' ', ' ', ' ', ' ']
Building on Mohammed Terry Jack answer which only works where each emoji is separated by a space. See a modified version below which has removed this requirement:
def extract_emojis(sentence):
return [sentence[i] for i in range(len(sentence)) if str(sentence[i].encode('unicode-escape'))[2] == '\' ]
Expected result:
>>> a = ' me así, bla es se ds '
>>> b = extract_emojis(a)
>>> b
[' ', ' ', ' ', ' ', ' ', ' ']