How to Learn Python for Data Science: My Real Experience
Alright, so people ask me all the time how I got into Python and data science. And honestly, I don’t have some fancy origin story. I just kinda fell into it.
I’m Likhon Hussain. I work as a Cloud Engineer, and somewhere along the way, Python just became my go-to tool. People think I’ve always been a coding genius or something. Nah man, I wasn’t. I struggled a lot in the beginning. Still do sometimes.
But the thing is, I actually stuck with it. Most people don’t. They try for like two weeks and then give up when it gets hard. So I figured I’d write down what actually worked for me, not some textbook stuff that sounds good but nobody actually follows.
How I Even Got Into This
Okay so like five years ago, I was doing cloud stuff. AWS, infrastructure, all that. And I was using different languages, dealing with a lot of complicated stuff. My code was long, messy, and I had to rewrite things constantly.
Then one day I had to do some quick data analysis for a project. Someone was like “hey, just use Python for this.” I was like “nah man, I’m fine with what I know.” But they kept pushing so I was like whatever, let me try.
I wrote like literally five lines of code:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
average = sum(data) / len(data)
print(f"Average: {average}")
That’s it. And I was like… that’s it? No weird syntax, no complicated stuff, just… normal English basically. I was used to writing way more lines to do the same thing. And that got me hooked. I was like okay, let me actually learn this properly.
My First Stupid Mistake
So I got excited right. Too excited. I watched like two videos on Python basics, and I was like “okay I get it, let me start learning Pandas and do cool data stuff.” Big mistake. HUGE mistake.
I went straight into Pandas without actually knowing Python properly. And dude, I spent like three weeks debugging code that should have taken me two days. I was so frustrated. I was googling the same error over and over. I couldn’t figure out why my code wasn’t working.
Finally my friend was like “bro, you don’t even understand loops properly. Go back and learn basics first.” That hurt to hear ngl. But he was right. So I literally went back to basics. Like, embarrassingly basic stuff.
# I spent like a week just doing this kind of stuff
fruits = ["apple", "banana", "orange"]
for fruit in fruits:
print(fruit)
# And this
numbers = [1, 2, 3, 4, 5]
for num in numbers:
print(num * 2)
# And understanding dictionaries
person = {
"name": "Likhon",
"age": 30,
"job": "Cloud Engineer"
}
print(person["name"])
# Functions
def add_numbers(a, b):
return a + b
result = add_numbers(5, 3)
print(result)
Yeah, super basic stuff. But you know what? It actually helped. Once I really understood how loops work, how functions work, how data types work… everything after that made way more sense. Don’t skip this part. Seriously. I know it’s boring. I know it feels like you’re wasting time. You’re not. Trust me on this.
The Math Thing That Scared Me
Okay so people always say “oh you need to know so much math for data science” and that scared me away for a while. I was like “ugh, math is boring, I suck at math.” But then I realized… it’s not really about being good at math in like, calculus way. It’s just about understanding what numbers mean.
Like, mean, median, standard deviation. That’s like… the basic stuff. And it actually makes sense when you’re working with real data.
import statistics
temps_last_month = [22, 23, 21, 25, 24, 23, 22, 21, 20, 28, 35, 40]
avg = statistics.mean(temps_last_month)
middle = statistics.median(temps_last_month)
spread = statistics.stdev(temps_last_month)
print(f"Average: {avg}")
print(f"Middle value: {middle}")
print(f"How spread out it is: {spread}")
I ran this on my city’s temperature data and I was like “ohhh, so when the standard deviation is high, that means the temperature varies a lot.” That’s not abstract anymore, that’s like… something real I can see.
Start with just these three things. Don’t go into complex math. Honestly, 80% of what I do uses just basic statistics. That’s it.
When Pandas Actually Clicked
So after I got comfortable with basics like actually comfortable, took me like two months I finally looked at Pandas properly. And man, it was like someone handed me a superpower. Pandas is basically just… taking messy data and cleaning it up, organizing it, finding patterns. That’s literally what I do half the time in my job.
Here’s something I actually use constantly:
import pandas as pd
# I download a CSV file of my cloud instances
df = pd.read_csv('instances.csv')
# Let's see what we have
print(df.head())
# What's the structure?
print(df.info())
# Quick stats
print(df.describe())
# Okay so only show me running instances
running = df[df['status'] == 'running']
print(f"Running: {len(running)}")
# How much does each region cost?
cost_by_region = df.groupby('region')['cost'].sum()
print(cost_by_region)
# Fix the names that are missing
df['name'].fillna('Unnamed', inplace=True)
# Add a new column - monthly cost
df['monthly_cost'] = df['hourly_cost'] * 730
When I learned this, I was like “wait, I can do this to ANY data?” Yeah man. Any data. This is where it gets actually useful.
Real Data is Messy as Hell
Okay so I learned on clean data. Perfect data. Data that doesn’t exist in real life. Then I got my first real project with actual data. It was GROSS. Missing values everywhere. Inconsistent date formats. Duplicates. Numbers that made no sense. I was lost.
That’s when I realized – I need to stop practicing on toy datasets. So I started downloading real data from Kaggle, finding messy government datasets, scraping data from websites. And it was brutal but super valuable.
import pandas as pd
df = pd.read_csv('messy_data.csv')
# First, let's see what's broken
print(df.isnull().sum()) # Missing values
# Fix missing values - different strategies
df['age'].fillna(df['age'].median(), inplace=True)
df['notes'].fillna('No notes', inplace=True)
# Duplicates
print(f"Duplicates: {df.duplicated().sum()}")
df = df.drop_duplicates()
# Data types are wrong
df['date'] = pd.to_datetime(df['date'])
df['amount'] = df['amount'].astype(float)
# Remove crazy outliers
Q1 = df['amount'].quantile(0.25)
Q3 = df['amount'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['amount'] >= Q1 - 1.5*IQR) & (df['amount'] <= Q3 + 1.5*IQR)]
print(f"Clean now: {df.shape}")
This is what real data science looks like. Not glamorous. Lots of cleaning. But it’s important.
Making Charts So People Actually Care
After like three months I made my first graph. It was terrible lol. Like really bad. But at least I learned something if you can’t show people what you found, they don’t care.
I learned Matplotlib and Seaborn. Started super simple:
import matplotlib.pyplot as plt
import seaborn as sns
# Super basic line chart
temps = [22, 23, 21, 25, 24, 23, 22, 21, 20, 28, 35, 40]
days = range(1, 13)
plt.plot(days, temps, marker='o')
plt.title('Temperature')
plt.xlabel('Day')
plt.ylabel('Degrees')
plt.show()
# Histogram to see distribution
import pandas as pd
df = pd.read_csv('sales.csv')
plt.hist(df['amount'], bins=30)
plt.title('How much people spend')
plt.show()
# Compare groups with box plot
sns.boxplot(x='region', y='sales', data=df)
plt.title('Sales by region')
plt.show()
Honestly, I still don’t make fancy charts. I make clear charts. That’s the difference. You don’t need to be an artist, you just need people to understand what they’re looking at.
When I Started Machine Learning (Finally)
So like four months in, I felt ready. Kinda. Actually I was probably still not ready but I did it anyway. I started with Scikit-learn because it’s straightforward. No fancy stuff, just algorithms.
First thing I did was linear regression:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd
df = pd.read_csv('houses.csv')
X = df[['sqft', 'bedrooms', 'bathrooms']]
y = df['price']
# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train
model = LinearRegression()
model.fit(X_train, y_train)
# Predict
preds = model.predict(X_test)
# How good?
error = mean_squared_error(y_test, preds)
r2 = r2_score(y_test, preds)
print(f"Error: {error}")
print(f"R-squared: {r2}")
And I spent like six weeks just on this. Really understanding what’s happening. Not rushing to the next thing.
The Mistake I Made (Again)
Then I was like okay now I can do neural networks and deep learning and all that fancy stuff. Nope. Crashed and burned. I didn’t understand basic concepts well enough. I was getting terrible results and couldn’t figure out why.
So I went backwards again. Learned decision trees, random forests, cross-validation properly. Boring stuff but necessary. Only then did deeper learning stuff start to make sense.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
# Train a random forest
forest = RandomForestClassifier(n_estimators=100)
forest.fit(X_train, y_train)
# Check how good it is
scores = cross_val_score(forest, X_test, y_test, cv=5)
print(f"Average score: {scores.mean()}")
# Which features matter?
print(forest.feature_importances_)
This stuff seemed boring at first. But when it worked, it was awesome.
How Long Did This Actually Take Me?
Real talk:
- Month 1-2: Basics. Writing dumb scripts. Every day. Bored out of my mind sometimes.
- Month 2-3: Pandas. Learning how to actually work with data.
- Month 3-4: Stats and graphs. Understanding what the numbers mean.
- Month 4-6: Machine learning. The exciting stuff. But going slow.
- Month 6+: Actually building things. Projects that matter.
Six months total. But I was working full-time. So like one to two hours every day, plus weekends. If I had done it full-time, probably three months.
Why I Actually Stuck With It
The biggest reason? Honestly, I found it interesting. I stopped practicing on random stuff and started analyzing things I actually cared about. My cloud infrastructure costs, data from projects I was working on, stuff like that.
And I didn’t quit when it got hard. There’s this period like month two to three where it’s boring and hard at the same time. That’s when everyone stops. I wanted to quit too. But I didn’t. I just kept coding. Every single day. Even when I didn’t understand anything. Even when I was frustrated. Even when it felt pointless.
What Actually Helped Me
Free YouTube tutorials for basics. Stack Overflow when I was stuck. Real Python website for actually understanding concepts. Kaggle for finding real datasets. And I did pay for one thing Andrew Ng’s machine learning course on Coursera. That actually helped a lot. The concepts clicked differently when someone explained them step by step.
Real Talk: What I Wish I Knew
If I could go back in time and tell younger me stuff:
- Don’t skip fundamentals. Your future self will thank you.
- Use real data. Boring messy data. That’s where you learn.
- You don’t need to memorize everything. Google is fine. Understanding is what matters.
- Build stuff you care about. Not random projects. Stuff that actually interests you.
- The hard part is not quitting. Not being smart. Just… not quitting.
- Find other people learning this. Talk to them. Ask them questions. It helps so much.
Where I’m At Now
Now Python is just… part of how I work. I use it for automation, data analysis, machine learning stuff, all of it. It’s my default tool. But here’s the thing I’m still learning. Every single week I learn something new. New tools, new techniques, new ways of doing things. And that’s cool. Means I’m not bored.
Can You Do This?
Yeah. You absolutely can. If I can do it and I’m not special, I’m just consistent you can definitely do it. Just start. Write something dumb in Python today. Then do it tomorrow. Then the day after. That’s it. The you six months from now will be like “damn, I’m glad I started back then.” That’s how it works.
