4.数据可视化:Visualing earnings based on college majors.(2010-2012)

The dataset is stored in recent-grads.csv file.It contains information on earnings of college majors in US from 2010 to 2012.

It can be download form here:https://github.com/fivethirtyeight/data/tree/master/college-majors

In this project,I will explore the dataset and try to find some patterns in the earning of majors then plot it use matplotlib library.

代码使用jupyter完成:
读取数据:

import pandas as pd

recent_grads=pd.read_csv('./data/recent-grads.csv')
recent_grads.columns
print(recent_grads.info())
print(recent_grads.describe())
print(recent_grads.head(1))

处理缺失值:

raw_data_count=recent_grads.shape[0]
print(raw_data_count)
cleaned_data_count=recent_grads.dropna().shape[0]
print(cleaned_data_count)

==>>173
172
绘制散点图,查看各属性之间的关系:

import matplotlib.pyplot as plt
%matplotlib inline

recent_grads.plot(x='Full_time',y='Median',kind='scatter')
recent_grads.plot(x='Unemployed',y='Median',kind='scatter')
recent_grads.plot(x='Men',y='Median',kind='scatter')
recent_grads.plot(x='Women',y='Median',kind='scatter')

得到


我们继续绘制柱状图,查看各属性的分布情况:

columns=['Median','Employed','Employed','Unemployment_rate','Women','Men']
['Men'].hist()
fig=plt.figure(figsize=(6,18))
for i,col in enumerate(columns):
    ax=fig.add_subplot(6,1,i+1)
    ax=recent_grads[col].hist(color='orange')
plt.show()

为了更方便的查看就业人数与薪资的关系,使用scatter_matrix函数来构建散点图矩阵:

from pandas.tools.plotting import scatter_matrix
scatter_matrix(recent_grads[['Employed','Median']],figsize=(10,10),c=['red','blue'])

关于该矩阵的说明:

接下来不妨做些有意思的事情,分析一下薪资前10以及后10的专业中女生所占比例:

recent_grads[:10].plot.bar(x='Major',y='ShareWomen')
plt.legend(loc='upper left')
plt.title('The 10 highest paying majors.')
recent_grads[162:].plot(x='Major',y='ShareWomen',kind='bar')
plt.title('The 10 lowest paying majors.')

分析薪资较高的专业中的男女性别比例:

recent_grads[:10].plot.bar(x='Major',y=['Men','Women'])
全部评论

相关推荐

EEbond:给北邮✌️跪了
点赞 评论 收藏
分享
评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客企业服务