Ever wondered how machines understand whether a tweet is positive or negative?
In this blog, I’ll walk you step-by-step through building a Sentiment Classifier using tweets from real people, the classic NLTK toolkit, and a Logistic Regression model. We’ll go from loading the data all the way to evaluating our model.
No prior ML expertise needed.
We’ll train a model to classify tweets as positive or negative. Think of it as a simple version of what powers Twitter/X sentiment analysis dashboards or brand monitoring tools.
Here’s what you’ll learn:
- How to load and process real tweet data
- What “tokenization”, “stemming”, and “stopword removal” mean (and why they matter)
- How to extract meaningful features from text
- How to train and evaluate a Logistic Regression model
- A full, working sentiment classifier by the end
NLTK comes with a ready-to-use Twitter dataset containing 5,000 positive and 5,000 negative tweets. First, we import our libraries and load this dataset:
from nltk.corpus import twitter_samples
positive_tweets = twitter_samples.strings('positive_tweets.json')
negative_tweets = twitter_samples.strings('negative_tweets.json')
Before feeding text to a model, we need to clean it. Tweets are messy — emojis, URLs, hashtags, retweets — all of that needs processing.
Here’s our plan:
- Remove links, stock tickers
- Strip out handles and hashtags
- Lowercase the text
- Remove stopwords like “is”, “and”, etc.
- Stem words (e.g., “running” → “run”)
Here’s our process_tweet
function:
def process_tweet(tweet):
stemmer = PorterStemmer()
stopwords_english = stopwords.words('english')tweet = re.sub(r'\$\w*', '', tweet)
tweet = re.sub(r'^RT[\s]+', '', tweet)
tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
tweet = re.sub(r'#', '', tweet)
tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)
tweet_tokens = tokenizer.tokenize(tweet)
tweets_clean = [stemmer.stem(word) for word in tweet_tokens
if word not in stopwords_english and word not in string.punctuation]
return tweets_clean
This function turns a noisy tweet like
👉#FollowFriday @tushar_elric for being top engaged members in my community this week :)
into 👉 ['followfriday','top','engag','member','commun','week',':)']
Notice how
:)
is retained in the output—it signals sentiment and adds emotional context.
We convert tweets into a DataFrame with a label (1
for positive, 0
for negative), combine both classes, shuffle the rows, and split into train/test:
raw_tweets = pd.concat([df_pos, df_neg], ignore_index=True).sample(frac=1, random_state=42)
Then apply our process_tweet
function:
raw_tweets['processed_tokens'] = raw_tweets['tweet'].apply(process_tweet)# Split it
train_df, test_df = train_test_split(
raw_tweets[['processed_tokens', 'label']],
test_size=0.2,
stratify=raw_tweets['label'],
random_state=42
)
This part is crucial. We’re creating a frequency dictionary:
A map of words to how often they appear in positive and negative tweets.
def build_freqs(tweets, ys):
freqs = {}
for y, tweet in zip(np.squeeze(ys).tolist(), tweets):
tokens = process_tweet(' '.join(tweet))
for word in tokens:
if word not in freqs:
freqs[word] = [0, 0]
freqs[word][1-int(y)] += 1
return freqs
Wonder why it’s
1 - int(y)
instead of justint(y)
? 🤔
Now that we have word frequencies, we turn each tweet into a 3-element vector:
def extract_features(tweet, freqs):
x = np.zeros((1, 3))
x[0, 0] = 1 # bias term
for word in process_tweet(tweet):
if word in freqs:
pos, neg = freqs[word]
x[0, 1] += pos
x[0, 2] += neg
return x
Now we stack up all vectors and train:
X_train = np.vstack([extract_features(' '.join(tweet), freqs) for tweet in train_x])
X_test = np.vstack([extract_features(' '.join(tweet), freqs) for tweet in test_x])model = LogisticRegression()
model.fit(X_train, train_y.ravel())
We use standard metrics:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_scorey_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(test_y, y_pred))
print("Precision:", precision_score(test_y, y_pred))
print("Recall:", recall_score(test_y, y_pred))
print("F1 Score:", f1_score(test_y, y_pred))
'''
Accuracy: 0.82
Precision: 0.84
Recall: 0.80
F1 Score: 0.82
'''
# Not bad at all for a simple, interpretable model!
What we built is a solid sentiment classifier using basic NLP and classical ML. It’s not deep learning, but it gets the job done — and teaches you a lot along the way.
You now understand:
- How to clean and tokenize tweets
- How to build word frequency dictionaries
- How to extract features for classification
- How to train and evaluate a model
Want to take this further? Here’s what you could try next:
- Use TF-IDF or Word2Vec instead of word counts
- Train a Naive Bayes classifier for comparison
- Build a web interface where users input tweets