Exercise 1: Legal Sentences Analysis

Exercise 2: Customer Review Analysis#

Now, we will analyze the reviews dataset from Disneyland theme parks.

We would like to extract the attractions that are mentioned in the reviews, with their associated sentiment.

As a starter, you can use the following code:

df = pd.read_csv('DisneylandReviews.csv', sep=',', encoding='iso-8859-1')[:100]
texts = df['Review_Text'].tolist()
  1. Use a suitable model to perform sentiment analysis and classify each review.

Create a sentiments list that is the same length as the reviews list, with the sentiment of each review.

from transformers import pipeline

pipe = pipeline("sentiment-analysis", truncation=True)

sentiments = []

for text in texts:
    result = pipe(text)[0]['label']
    sentiments.append(result)
  1. Define the label to extract the attractions of the reviews text:

labels = ["Attraction"]
  1. Apply the GLiNER model to the reviews and extract the attractions mentioned in the reviews.

Also add the information of the corresponding sentiment to each entity extracted.

results = []

for sentiment, text in zip(sentiments, texts):
    result = model.predict_entities(text, labels, threshold=0.5)
    for entity in result:
        entity['sentiment'] = sentiment
    results.append(result)
result
[{'start': 51,
  'end': 67,
  'text': 'Disneyland Paris',
  'label': 'Attraction',
  'score': 0.6424593925476074,
  'sentiment': 'POSITIVE'},
 {'start': 287,
  'end': 301,
  'text': 'Space Mountain',
  'label': 'Attraction',
  'score': 0.9413168430328369,
  'sentiment': 'POSITIVE'}]
  1. Convert it to a dataframe and display the results.

# flatten the list
results = [item for sublist in results for item in sublist]

df_entities = pd.DataFrame(results)
  1. Compute the average sentiment between all the extracted attractions (Assume positive is 1 and negative is 0).

df_entities['sentiment'].replace({'POSITIVE': 1, 'NEGATIVE': 0}, inplace=True)
df_entities['sentiment'].mean()
0.7291666666666666
  1. Compute the average sentiment, grouping by the attraction

df_entities.groupby('text')['sentiment'].mean().reset_index().sort_values('sentiment', ascending=False)
text sentiment
0 Adventure Land 1.0
41 Mickey s Wondrous Book 1.0
69 Tomorrow land 1.0
68 The park 1.0
67 The lion king show 1.0
... ... ...
77 Winnie the Pooh 0.0
76 Walt Disney World 0.0
21 Explorers lodge 0.0
74 Toy story land 0.0
11 Disney castle 0.0

101 rows × 2 columns

  1. Repeat a similar analysis, but extracting Food.

labels = ["Food"]

results = []

for sentiment, text in zip(sentiments, texts):
    result = model.predict_entities(text, labels, threshold=0.5)
    for entity in result:
        entity['sentiment'] = sentiment
    results.append(result)
# flatten the list
results = [item for sublist in results for item in sublist]

df_entities = pd.DataFrame(results)

df_entities['sentiment'].replace({'POSITIVE': 1, 'NEGATIVE': 0}, inplace=True)
df_entities['sentiment'].mean()
0.7073170731707317
df_entities.groupby('text')['sentiment'].mean().reset_index().sort_values('sentiment', ascending=False)
text sentiment
16 caramel popcorn 1.000000
12 Vegetarian food 1.000000
29 snacks 1.000000
26 hire pram 1.000000
25 halal foods 1.000000
24 halal food 1.000000
23 food options 1.000000
19 drinks 1.000000
18 disneyland 1.000000
15 brilliant food 1.000000
14 bottled water 1.000000
13 average food 1.000000
32 western restaurants 1.000000
11 Tahitian Terrace 1.000000
3 Food 1.000000
2 Disney theme sweets 1.000000
8 Pricey food 1.000000
7 Lovely food 1.000000
10 Standard Chinese food 1.000000
5 Honkers 1.000000
4 Food options 1.000000
21 food 0.857143
31 vegie burger 0.000000
30 vegetarian food 0.000000
28 popcorn 0.000000
27 pop corn 0.000000
6 Limited restaurants 0.000000
22 food and drink 0.000000
20 fast food 0.000000
17 corn on the cob 0.000000
1 Children s food 0.000000
9 Souvenirs 0.000000
0 $60 steak 0.000000