Data Understanding
- read_csv()
- head()
- tail()
- info()
- describe()
- isnull(), isnull().sum()
- duplicated(keep=False)
Data cleaning
- dropna()
- fillna()
- drop_duplicates()
- to_datetime()
to_datetime() function converts an object to datetime format.
Data Analysis and Manipulation
- value_counts()
df[‘Education’].value_counts() - unique()
df[‘Education’].unique() - nunique()
df[‘Education’].nunique() Output = 5 - sort_values()
df.sort_values(by=’Income’, ascending=False) - query()
query() method filters out the data frame by the condition we want.
df.query(‘Income > 100000’) or df[df[‘Income’] > 100000] -
groupby()
df.groupby(‘Education’)[‘Income’].mean()
-
pivot_table()—pivot_table() method creates a useful pivot table for us.
There are 4 arguments we should use as input: data, index, columns and values.
By default, the method uses the mean as an aggregation function. We can also change it.pd.pivot_table(data = df, values= ‘Income’, index = ‘Education’, columns = ‘Marital_Status’)
- apply()
df[‘Response’].apply(lambda x : ‘Accepted’ if x == 1 else ‘Not Accepted’ ) - replace()
df[‘Marital_Status’].replace(to_replace=[‘Alone’,’Divorced’,’Widow’,’YOLO’,’Absurd’],value=’Single’)