参考: https://www.jianshu.com/p/14e635662fff #sampleBy根据指定字段分层抽样,抽取0.02% sample_data = df . sampleBy ( 'gender' ,{ 1 : 0.0002 , 2 : 0.0002 }). select ( "balance" , "numTrans" , "numIntlTrans" )
参考:
https://www.jianshu.com/p/14e635662fff
sample_data = df.sampleBy('gender',{1:0.0002,2:0.0002}).select("balance","numTrans","numIntlTrans")
sample_data.take(5)
To put multiple 2D charts in one go you can use
data_multi = dict([
(elem, data_sample.select(elem).rdd.flatMap(lambda row: row).collect())
for elem in numerical
])
sctr = chrt.Scatter(data_multi, x='balance', y='numTrans')
chrt.show(sctr)
# 或者这样
#按字段将值分开
data_multi ={elem:sample_data.select(elem).rdd.flatMap(lambda x:x).collect()\
for elem in ["balance","numTrans","numIntlTrans"]}
#画散点图,plt.scatter(x,y)
plt.scatter(data_multi["numTrans"],data_multi["balance"],c ='r',marker='o',label="numTrans-balance")
plt.scatter(data_multi["numIntlTrans"],data_multi["balance"],c ='b',marker="x",label="numInt-balance")
plt.grid(True)#网格
plt.xlabel("x value")
plt.ylabel("y value")
plt.legend(loc='upper right')#图例的位置
plt.title("scatter view")