Sentimental Analysis Project in R

We'll talk about sentiment analysis in R of the data science project in this article. To analyse the data and obtain scores for the appropriate terms present in the dataset, we will utilise the syuzhet text package. We'll use the tiny text package to analyse the data and assign scores to the words in the dataset that corresponds to them.

The ultimate goal is to create a sentiment analysis model that can distinguish positive and negative phrases, as well as their magnitude. By the end of the blog, you'll be able to solve any R programming challenge from the data science course.

What is sentiment analysis?

The computational process of automatically determining what feelings a writer is expressing in the text is known as sentiment analysis. Sentiment Analysis is a method of gathering opinions with various scores, such as positive, negative, or neutral. The mood is sometimes expressed as a binary distinction (good vs. negative), but it can also be more nuanced, such as describing the exact emotion expressed by an author (like fear, joy or anger).

You can determine the nature of opinions or statements in text using sentiment analysis. You can understand how people feel about your brand, product, or service by recognising positive and negative sentiment in text data such as tweets, product reviews, and support requests, and receive insights that lead to data-driven decisions.

Many applications, particularly in business intelligence, use sentiment analysis. Sentiment Analysis is a sort of classification in which data is divided into categories such as positive or negative, joyful, sad, furious, and so on. The following are some instances of sentiment analysis applications:

Examining the conversation on social media about a specific topic
Examining survey results
Identifying whether product evaluations are favorable or unfavorable

Sentiment analysis isn't flawless, and your results will contain errors, just like any other automatic language analysis. Sentiment analysis, in general, seeks to determine a writer's or speaker's attitude toward a certain issue or the overall contextual polarity of a document. It also can't explain why a writer is feeling the way he or she is.

Business companies can use opinion polarity and sentiment topic recognition to acquire a better grasp of the causes and the overall breadth of the problem globally. However, it can be good to rapidly summarise some text characteristics, especially if you have a large amount of text to evaluate. These insights can then be used to improve competitive intelligence and customer service, resulting in a better brand image and a competitive advantage. Moreover, it is one of the best data science project ideas for beginners and experts as well.

Performing Sentiment Analysis with the Inner Join

We'll import our libraries 'janeaustenr', 'stringr', and 'tidytext' in this stage. The dplyr inner join() function will tell you which terms are present in both the sentiment lexicon and the text dataset you're looking at. The janeaustenr package will give us textual data in the form of novels written by Jane Austen.

We'll be able to execute efficient text analysis on our data thanks to Tidytext. In summary, this approach, as shown in the chart above, uses the tidytext package's unnest tokens() function to tokenize (a process of splitting text data into tokens - one word per row) the data, then uses dplyr functions to inner join the tidy text with the chosen sentiment lexicon, after which the joined text data is summarised and visualised with ggplot2. The unnest tokens() function will be used to turn the text of our books into a neat format. Enrol at Learnbay: best data science course in Bangalore for more details.

Sentiment Analysis with inner join

# tokenize the texts from Jane Austen books

tidy_books <- austen_books() %>%

group_by(book) %>%

mutate(

linenumber = row_number(),

chapter = cumsum(str_detect(text,

regex("^chapter [\\divxlc]",

ignore_case = TRUE)))) %>%

ungroup() %>%

unnest_tokens(word, text)

# filter the joy words from the NRC lexicon

nrc_joy <- get_sentiments("nrc") %>%

filter(sentiment == "joy")

# filter the tidy text dataframe with text from the books for the words from "Emma" and then perform sentiment analysis.

tidy_books %>%

filter(book == "Emma") %>%

inner_join(nrc_joy) %>%

count(word, sort = TRUE)

Because the count column in unnest tokens() is named word, it's easy to connect with the sentiment dataset:

nrc_joy <- get_sentiments("nrc") %>%

filter(sentiment == "joy")

tidy_books %>%

filter(book == "Emma") %>%

inner_join(nrc_joy) %>%

count(word, sort = TRUE)

#> # A tibble: 303 x 2

#> word n

#> <chr> <int>

#> 1 good 359

#> 2 young 192

#> 3 friend 166

#> 4 hope 143

#> 5 happy 125

#> 6 love 117

#> # ... with 297 more rows

Then we count how many positive and negative words are in each book's designated sections. To keep track of where we are in the story, we create an index that counts up sections of 80 lines of text (using integer division).

jane_austen_sentiment <- tidy_books %>%

inner_join(get_sentiments("bing")) %>%

mutate(index = linenumber %/% 80) %>%

count(book, index, sentiment) %>%

pivot_wider(names_from = sentiment, values_from = n, values_fill = list(n = 0)) %>%

mutate(sentiment = positive - negative)

jane_austen_sentiment

#> # A tibble: 920 x 5

#> book index negative positive sentiment

#> <fct> <dbl> <int> <int> <int>

#> 1 Sense & Sensibility 0 16 32 16

#> 2 Sense & Sensibility 1 19 53 34

#> 3 Sense & Sensibility 2 12 31 19

#> 4 Sense & Sensibility 3 15 31 16

#> 5 Sense & Sensibility 4 16 34 18

#> 6 Sense & Sensibility 5 16 51 35

#> # ... with 914 more rows

ggplot(jane_austen_sentiment) +

geom_col(aes(index, sentiment, fill = book), show.legend = F) +

facet_wrap( ~ book, ncol = 2, scales = "free_x")

Sentiment Analysis across all novels

jane_austen_sentiment <- tidy_books %>%

inner_join(get_sentiments("bing")) %>%

count(book, index = linenumber %/% 80, sentiment) %>%

pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%

mutate(sentiment = positive - negative)

## Joining, by = "word"

# plot the sentiment scores across the plot trajectory of each novel

ggplot(jane_austen_sentiment, aes(index, sentiment, fill = book)) +

geom_col(show.legend = FALSE) +

facet_wrap(~book, ncol = 2, scales = "free_x")

The three sentiment dictionaries are compared:

In this section, we explore how sentiment varies throughout Pride and Prejudice's narrative arc using all three sentiment lexicons ("nrc," "afinn," and "bong").

# filter the tidy text dataframe "tidy_books" for where book is "Pride & Prejudice"

pride_prejudice <- tidy_books %>% filter(book == "Pride & Prejudice")

# Using the "afinn" lexicon

afinn <- pride_prejudice %>%

inner_join(get_sentiments("afinn")) %>%

group_by(index = linenumber %/% 80) %>%

summarise(sentiment = sum(value)) %>%

mutate(method = "AFINN")

## Joining, by = "word"

# Using the "bing" and "nrc" lexicon

bing_and_nrc <- bind_rows(

pride_prejudice %>%

inner_join(get_sentiments("bing")) %>%

mutate(method = "Bing et al."),

pride_prejudice %>%

inner_join(get_sentiments("nrc") %>%

filter(sentiment %in% c("positive",

"negative"))

) %>%

mutate(method = "NRC")) %>%

count(method, index = linenumber %/% 80, sentiment) %>%

pivot_wider(names_from = sentiment,

values_from = n,

values_fill = 0) %>%

mutate(sentiment = positive - negative)

## Joining, by = "word"

# Bind the three lexicons together and visualize

bind_rows(afinn,

bing_and_nrc) %>%

ggplot(aes(index, sentiment, fill = method)) +

geom_col(show.legend = FALSE) +

facet_wrap(~method, ncol = 1, scales = "free_y")

Final Thoughts

Sentiment analysis is a useful tool and top data science project idea for analyzing the content of a body of text and gaining insight into the most essential terms. We went over our sentiment analysis project in R in this blog. If sentiment analysis is useful, a review's positive vs. negative sentiment should be able to predict the star rating. If you want to learn more about data science courses or data science projects, you may check the website of our Learnbay. We provide the best data science course in Bangalore with detailed explanations.

The word cloud in this research shows that the Bing lexicon detected more positive words from James Hartwell's three works than the Loughran lexicon, which found more negative words. We studied the notion of sentiment analysis and applied it to a dataset of Jane Austen's novels.

It is vital to examine the content of the positive and negative components of the lexicons being employed rather than jumping to conclusions. After performing data wrangling on our data, we were able to distinguish it using several visualizations. In this research, we employed a lexical analyzer called 'bing.' We also created a visual report of the word cloud and used a plot to depict the sentiment score. So, hurry up!! sign in for a data science course in Bangalore and start exploring.

Search This Blog

aka title