Isolating only chats from visitor- NLP

Question:

Below I have an example of a chat message which I have stored in a table in Big Query:

<p align="center">Chat Started: Wednesday, January 06, 2021, 09:08:01 (-0600)</p>( 3s ) Live Chat: Hello! I am the company Bot.<br>( 5s ) Live Chat: What brings you to company today?<br>( 6s ) Live Chat: {ChatWindowButton:Sales,Support}<br>( 12s ) Visitor: Support<br>( 14s ) Live Chat: It sounds like you’re looking for our support team instead of the sales department. <br>( 16s ) Live Chat: {ChatWindowButton:End Chat,Chat with Sales Agent}<br>( 29s ) Visitor: How do I find my account number?<br>( 32s ) Live Chat: Sorry, I didn&#39;t understand that. Would you please rephrase?<br>

Mostly all the chats follow the same format. What I’m trying to do either in SQL or Python is only pull the Visitor chat (ie the messages that the visitor sends and not the chat bot). Basically, I want to do topic modeling. But I’m having hard time isolating only the Visitor chat. Is there some way for me to pull this off somehow?

Asked By: runner16

||

Answers:

Do you mean like this?

import bs4

s = '<p align="center">Chat Started: Wednesday, January 06, 2021, 09:08:01 (-0600)</p>( 3s ) Live Chat: Hello! I am the company Bot.<br>( 5s ) Live Chat: What brings you to company today?<br>( 6s ) Live Chat: {ChatWindowButton:Sales,Support}<br>( 12s ) Visitor: Support<br>( 14s ) Live Chat: It sounds like you’re looking for our support team instead of the sales department. <br>( 16s ) Live Chat: {ChatWindowButton:End Chat,Chat with Sales Agent}<br>( 29s ) Visitor: How do I find my account number?<br>( 32s ) Live Chat: Sorry, I didn&#39;t understand that. Would you please rephrase?<br>'


soup = bs4.BeautifulSoup(s, "html.parser")
messages = map(
    lambda s: s.split(":")[1].strip(),
    filter(lambda e: isinstance(e, bs4.element.NavigableString) and "Visitor" in e, soup.children),
)
In : [*messages]
Out: ['Support', 'How do I find my account number?']
Answered By: lo tolmencre
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.