Sunday, February 23, 2025

Understanding Z-Tests with Python: The Statistical Battle Between Hrithik’s Fighter and SRK’s Pathaan



A Cold January in Kolkata: The Curious Case of Fighter

It was a chilly January evening in Kolkata, and the iconic Priya Cinema in South Kolkata was buzzing with excitement. Hrithik Roshan’s latest blockbuster, Fighter, had just been released, and fans were lining up outside the theatre, animatedly discussing their expectations for the film.

Rahul, a data science enthusiast and an ardent Bollywood fan, had just finished watching the movie with his friends. As they walked down Rashbehari Avenue, sipping hot tea from a street-side stall, a heated discussion broke out.

“This is Hrithik’s best action movie ever!” Sohom declared, a die-hard Hrithik Roshan fan.

“I don’t know… Pathaan had better action sequences,” argued Ayan, a devoted Shah Rukh Khan follower.

Rahul smirked, stirring his tea. “Alright, let’s settle this with data,” he said.

“How?” asked Priya, raising an eyebrow.

“Simple. We’ll use Z-tests to compare the audience ratings and recommendation rates of both movies and see if Hrithik’s Fighter statistically outperforms Pathaan!”

And so began Rahul’s mission to use statistics to settle Bollywood debates once and for all.

Gathering the Data: What Do “Kolkatans” Think?

Rahul had spent few minutes gathering audience ratings outside Priya Cinema.

Ratings for Fighter (Out of 100):
[88, 90, 92, 85, 87, 91, 89, 88, 90, 86]

Average rating of Hrithik’s past movies = 85
Standard deviation of past ratings = 5

“Now, let’s test if Fighter is significantly better than Hrithik’s previous movies using a One-Sample Z-Test.”

Rahul explained:

“A One-Sample Z-Test compares the sample mean (ratings for Fighter) to a known population mean (average ratings of Hrithik’s past movies). If the p-value is low, it means Fighter is significantly better than his previous films.”

Ayan raised an eyebrow. “And if the p-value isn’t low?”

“Then it’s just another Hrithik movie — nothing special statistically,” Rahul grinned.

Formula for One-Sample Z-Test

Z=Xˉμσ/nZ = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}

where:

  • Xˉ\bar{X} = Sample mean (average rating of Fighter)
  • μ\mu = Population mean (average rating of Hrithik’s past movies)
  • σ\sigma = Standard deviation
  • nn = Sample size

Python Implementation

Rahul quickly wrote a Python script on his laptop.

import numpy as np
from statsmodels.stats.weightstats import ztest

# Sample data: audience ratings for Fighter
ratings = np.array([88, 90, 92, 85, 87, 91, 89, 88, 90, 86])
population_mean = 85 # Average rating of Hrithik Roshan’s past movies

# One-sample Z-test
z_stat, p_value = ztest(ratings, value=population_mean)

print(f"Z-statistic: {z_stat}, P-value: {p_value}")

Z-statistic: 5.125453177063487, P-value: 2.968229279591197e-07

Rahul hit run, and the above results appeared.

Sohom jumped up. “That’s low, right? That means Fighter is the best!”

Rahul grinned. “Yes! Since the p-value is less than 0.05, we reject the null hypothesis. This means Fighter is statistically better than Hrithik’s past movies!”

“Wait,” Priya interrupted. “How much better is Fighter? Like, is it barely better or way better?”

“Good point,” Rahul said. “Let’s calculate Cohen’s d, which measures the effect size — how strong the difference is.”

Formula for Cohen’s d

d=Xˉμσ​

Ayan nodded. “If Cohen’s d is small, Fighter might be better, but not by much. If it’s large, then Hrithik’s really outdone himself.”

# Compute Cohen's d (Effect Size)
sample_mean = np.mean(ratings)
sample_std = np.std(ratings, ddof=1) # Sample standard deviation
cohen_d = (sample_mean - population_mean) / sample_std

print(f"Cohen's d: {cohen_d}")
Cohen's d: 1.6208106080066909

Priya’s eyes widened. “Whoa. That’s a large effect size!”

Rahul nodded. “Yep! Since Cohen’s d > 0.8, the difference isn’t just statistically significant — it’s practically significant too!”



Confidence Interval: The Real Range of Ratings

Ayan, still skeptical, asked, “Okay, but what if our sample ratings are biased? What if most people actually gave it a lower score?”

Rahul smiled. “Good question! That’s where Confidence Intervals come in.”

Formula for Confidence Interval

CI=Xˉ±Zα/2×σn​

A 95% confidence interval tells us the range of ratings we’d expect if we surveyed more people.

from scipy.stats import norm

# Compute 95% Confidence Interval
z_critical = norm.ppf(0.975) # 1.96 for 95% confidence
margin_of_error = z_critical * (sample_std / np.sqrt(len(ratings)))
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print(f"95% Confidence Interval: {confidence_interval}")


95% Confidence Interval: (87.22336653938828, 89.97663346061171)

“So if we surveyed more people, we’d expect the average rating to be between 87.22 and 89.98,” Rahul explained.

Sohom was thrilled. “That means Fighter is almost guaranteed to be better than Hrithik’s past movies!”

Two sample Z test

Sohom pumped his fist in victory, while Ayan sighed. “Okay, but can it beat Pathaan?”

Rahul could feel the tension rising among his friends. Sohom, the die-hard Hrithik fan, smirked. “Come on, we all know Fighter is ruling the box office right now! Just look at the crowd outside the theatre!”

Ayan crossed his arms. “That’s just the first weekend hype. Pathaan had insane longevity — it ran for over 50 days in some theatres! Don’t just go by intuition. Let’s talk numbers.”

Priya, ever the peacemaker, intervened. “Alright, alright. Why don’t we compare actual audience ratings? That should tell us which movie people enjoyed more.”

Rahul grinned. “Exactly what I was about to do. Time for some statistics!”

Rahul had prepared for this moment. He already gathered audience ratings from Kolkata moviegoers for both Fighter and Pathaan.

Ratings for Pathaan:
[94, 97, 95, 86, 89, 94, 89, 91, 91, 88]

Ratings for Fighter:
[88, 90, 89, 85, 87, 91, 89, 88, 90, 86]

As they huddled around Rahul’s laptop at Mouchak, the famous North Kolkata sweet shop, he explained:

“A Two-Sample Z-Test will tell us whether there’s a statistically significant difference in the audience ratings of Fighter and Pathaan.”

“Hold on,” Ayan interrupted, “why a Z-test and not a t-test?”

“Good question,” Rahul said, pleased with the curiosity. “We’re assuming a large sample size and known variance of audience ratings, making the Z-test a more appropriate choice. If we had a smaller sample or unknown variance, a t-test would be better.”

Sohom leaned in, intrigued. “And what exactly does this test do?”

Rahul continued, “The null hypothesis (H0​) assumes that both movies have equal audience ratings, while the alternative hypothesis (H1​) suggests that one is significantly better. The Z-score tells us how many standard deviations apart the two samples are, and the p-value tells us whether this difference is significant.”

“Can you please talk in the language in which a common man like us can understand?” Ayan sighed and remarked.

Rahul (with a smiling face) “We’re trying to figure out if two movies have the same audience ratings or if one is clearly better. We start by assuming they’re the same (that’s our null hypothesis). The z-score helps us see how far apart the average ratings of the two movies are, measured in terms of how much ratings typically vary. The p-value then tells us the chance of seeing such a difference in ratings if the movies were actually equally liked. If the p-value is small (usually below 0.05), it means it’s very unlikely we’d see this difference by chance, so we can reject our assumption that the movies are equal and conclude that one is probably rated significantly better than the other (that’s our alternative hypothesis). To make this decision, we compare the p-value to a significance level (alpha), often set at 0.05. If the p-value is less than alpha, we reject the null hypothesis, suggesting that there’s enough evidence to conclude that the two movies have different audience ratings. If the p-value is greater than alpha, we fail to reject the null hypothesis, meaning we don’t have enough evidence to say the ratings are different. However, failing to reject the null hypothesis doesn’t necessarily mean the movies have the same ratings; it simply means we don’t have enough evidence to prove otherwise.



Formula for Two-Sample Z-Test

Z=X1ˉX2ˉσ12n1+σ22n2Z = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}

where:

  • X1ˉ,X2ˉ\bar{X_1}, \bar{X_2}= Sample means of Fighter and Pathaan
  • σ1,σ2\sigma_1, \sigma_2 = Standard deviations
  • n1,n2n_1, n_2 = Sample sizes


import numpy as np
from statsmodels.stats.weightstats import ztest
from scipy.stats import norm

# Compute 95% Confidence Interval
z_critical = norm.ppf(0.975) # 1.96 for 95% confidence
margin_of_error = z_critical * (sample_std / np.sqrt(len(ratings)))
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print(f"95% Confidence Interval: {confidence_interval}")

95% Confidence Interval: (87.22336653938828, 89.97663346061171)

Sohom’s eyes widened. “Wait… so Pathaan actually scored higher?”

Rahul nodded. “Yes! Since the p-value is less than 0.05, we reject the null hypothesis. This means Pathaan has a statistically significant higher audience rating than Fighter.”

As the last few sips of chai disappeared from their cups at Mouchak, Rahul leaned back, staring at his laptop screen. The numbers were clear. The One-Sample Z-Test had shown that Fighter was statistically superior to Hrithik Roshan’s past movies in terms of audience ratings.

“So it’s official,” Sohom grinned. “Fighter is Hrithik’s best movie yet — data doesn’t lie!”

Ayan, ever the SRK loyalist, chuckled. “Sure, but the fight isn’t over yet. Let’s talk about the Two-Sample Z-Test results.”

Rahul smirked, scrolling to the second test’s results.

“While Fighter has better ratings, Pathaan actually wins in a direct audience comparison. The Two-Sample Z-Test shows that Pathaan’s ratings are significantly higher than Fighter’s when compared head-to-head!”

Priya laughed. “So Fighter is Hrithik’s personal best, but Pathaan is still the Bollywood king?”

Rahul nodded. “That’s what the numbers say. Fighter might have topped Hrithik’s previous movies, but when put up against Pathaan, it falls a bit short.”

Sohom looked slightly deflated, but Ayan patted him on the back. “Hey, your guy still got a personal best. But you can’t beat King Khan that easily.”

Rahul shut his laptop and smiled. “So, what’s next?”

Ayan grinned. “Simple — Animal vs. Leo. This time, we’re running the tests! and we will run different variations of Z tests.”

As they walked down Rashbehari Avenue, surrounded by flashing movie posters and the chatter of excited fans, one thing was clear — Bollywood debates were endless, but with statistics, the truth was just a test away.

Share your thoughts. :)





Check out my blogs

  1. https://statisticalmultiversebysohom.blogspot.com/
  2. https://ai-statistician-by-sohom.blogspot.com/

No comments:

Post a Comment

Understanding Z-Tests with Python: The Statistical Battle Between Hrithik’s Fighter and SRK’s Pathaan

A Cold January in Kolkata: The Curious Case of Fighter It was a chilly January evening in Kolkata, and the iconic Priya Cinema in South Kol...