Building a Twitter Sentiment Classifier with NLTK and Scikit-Learn | by Tushar | May, 2025

Ever wondered how machines understand whether a tweet is positive or negative?

In this blog, I’ll walk you step-by-step through building a Sentiment Classifier using tweets from real people, the classic NLTK toolkit, and a Logistic Regression model. We’ll go from loading the data all the way to evaluating our model.

No prior ML expertise needed.

We’ll train a model to classify tweets as positive or negative. Think of it as a simple version of what powers Twitter/X sentiment analysis dashboards or brand monitoring tools.

Here’s what you’ll learn:

How to load and process real tweet data
What “tokenization”, “stemming”, and “stopword removal” mean (and why they matter)
How to extract meaningful features from text
How to train and evaluate a Logistic Regression model
A full, working sentiment classifier by the end

NLTK comes with a ready-to-use Twitter dataset containing 5,000 positive and 5,000 negative tweets. First, we import our libraries and load this dataset:

from nltk.corpus import twitter_samples
positive_tweets = twitter_samples.strings('positive_tweets.json')
negative_tweets = twitter_samples.strings('negative_tweets.json')

Before feeding text to a model, we need to clean it. Tweets are messy — emojis, URLs, hashtags, retweets — all of that needs processing.

Here’s our plan:

Remove links, stock tickers
Strip out handles and hashtags
Lowercase the text
Remove stopwords like “is”, “and”, etc.
Stem words (e.g., “running” → “run”)

Here’s our process_tweet function:

def process_tweet(tweet):
stemmer = PorterStemmer()
stopwords_english = stopwords.words('english')tweet = re.sub(r'\$\w*', '', tweet)
tweet = re.sub(r'^RT[\s]+', '', tweet)
tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
tweet = re.sub(r'#', '', tweet)
tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)
tweet_tokens = tokenizer.tokenize(tweet)
tweets_clean = [stemmer.stem(word) for word in tweet_tokens 
if word not in stopwords_english and word not in string.punctuation]
return tweets_clean

This function turns a noisy tweet like

👉#FollowFriday @tushar_elric for being top engaged members in my community this week :)

into 👉 ['followfriday','top','engag','member','commun','week',':)']

Notice how :) is retained in the output—it signals sentiment and adds emotional context.

We convert tweets into a DataFrame with a label (1 for positive, 0 for negative), combine both classes, shuffle the rows, and split into train/test:

raw_tweets = pd.concat([df_pos, df_neg], ignore_index=True).sample(frac=1, random_state=42)

Then apply our process_tweet function:

raw_tweets['processed_tokens'] = raw_tweets['tweet'].apply(process_tweet)# Split it
train_df, test_df = train_test_split(
raw_tweets[['processed_tokens', 'label']],
test_size=0.2,
stratify=raw_tweets['label'],
random_state=42
)

This part is crucial. We’re creating a frequency dictionary:
A map of words to how often they appear in positive and negative tweets.

def build_freqs(tweets, ys):
freqs = {}
for y, tweet in zip(np.squeeze(ys).tolist(), tweets):
tokens = process_tweet(' '.join(tweet))
for word in tokens:
if word not in freqs:
freqs[word] = [0, 0]
freqs[word][1-int(y)] += 1
return freqs

Wonder why it’s 1 - int(y) instead of just int(y)? 🤔

Now that we have word frequencies, we turn each tweet into a 3-element vector:

def extract_features(tweet, freqs):
x = np.zeros((1, 3))
x[0, 0] = 1  # bias term
for word in process_tweet(tweet):
if word in freqs:
pos, neg = freqs[word]
x[0, 1] += pos
x[0, 2] += neg
return x

Now we stack up all vectors and train:

X_train = np.vstack([extract_features(' '.join(tweet), freqs) for tweet in train_x])
X_test = np.vstack([extract_features(' '.join(tweet), freqs) for tweet in test_x])model = LogisticRegression()
model.fit(X_train, train_y.ravel())

We use standard metrics:

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_scorey_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(test_y, y_pred))
print("Precision:", precision_score(test_y, y_pred))
print("Recall:", recall_score(test_y, y_pred))
print("F1 Score:", f1_score(test_y, y_pred))
'''
Accuracy: 0.82
Precision: 0.84
Recall: 0.80
F1 Score: 0.82
'''
# Not bad at all for a simple, interpretable model!

What we built is a solid sentiment classifier using basic NLP and classical ML. It’s not deep learning, but it gets the job done — and teaches you a lot along the way.

You now understand:

How to clean and tokenize tweets
How to build word frequency dictionaries
How to extract features for classification
How to train and evaluate a model

Want to take this further? Here’s what you could try next:

Use TF-IDF or Word2Vec instead of word counts
Train a Naive Bayes classifier for comparison
Build a web interface where users input tweets

Building a Twitter Sentiment Classifier with NLTK and Scikit-Learn | by Tushar | May, 2025

Recent Articles

Title: Thirsty Iran, Visionary Iran – Strategic Solutions to the Water Crisis by Seyed Mohsen… | by Saman sanat mobtaker | May, 2025

Today’s Hurdle hints and answers for May 4, 2025

Understanding RAG Part IX: Fine-Tuning LLMs for RAG

U.S. Charges Yemeni Hacker Behind Black Kingdom Ransomware Targeting 1,500 Systems

Why I stopped Using Cursor and Reverted to VSCode

Related Stories

Leave A Reply Cancel reply