我正在使用以下 df: c.sort_values('2005', ascending=False).head(3) GeoName ComponentName IndustryId IndustryClassification Description 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 201437926 Alabama Real GDP by state 9 213 Suppo
c.sort_values('2005', ascending=False).head(3) GeoName ComponentName IndustryId IndustryClassification Description 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 37926 Alabama Real GDP by state 9 213 Support activities for mining 99 98 117 117 115 87 96 95 103 102 (NA) 37951 Alabama Real GDP by state 34 42 Wholesale trade 9898 10613 10952 11034 11075 9722 9765 9703 9600 9884 10199 37932 Alabama Real GDP by state 15 327 Nonmetallic mineral products manufacturing 980 968 940 1084 861 724 714 701 589 641 (NA)
我想在所有年份强制数字:
c['2014'] = pd.to_numeric(c['2014'], errors='coerce')
有没有一种简单的方法可以做到这一点,还是我必须全部输入?
更新:之后您无需转换您的值,您可以在阅读CSV时即时执行此操作:In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0) In [166]: df.dtypes Out[166]: GeoName object ComponentName object IndustryId int64 IndustryClassification object Description object 2004 int64 2005 int64 2006 int64 2007 int64 2008 int64 2009 int64 2010 int64 2011 int64 2012 int64 2013 int64 2014 float64 dtype: object
如果需要将多个列转换为数字dtypes – 请使用以下技术:
样本来源DF:
In [271]: df Out[271]: id a b c d e f 0 id_3 AAA 6 3 5 8 1 1 id_9 3 7 5 7 3 BBB 2 id_7 4 2 3 5 4 2 3 id_0 7 3 5 7 9 4 4 id_0 2 4 6 4 0 2 In [272]: df.dtypes Out[272]: id object a object b int64 c int64 d int64 e int64 f object dtype: object
将所选列转换为数字dtypes:
In [273]: cols = df.columns.drop('id') In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce') In [275]: df Out[275]: id a b c d e f 0 id_3 NaN 6 3 5 8 1.0 1 id_9 3.0 7 5 7 3 NaN 2 id_7 4.0 2 3 5 4 2.0 3 id_0 7.0 3 5 7 9 4.0 4 id_0 2.0 4 6 4 0 2.0 In [276]: df.dtypes Out[276]: id object a float64 b int64 c int64 d int64 e int64 f float64 dtype: object
PS如果要选择所有字符串(对象)列,请使用以下简单技巧:
cols = df.columns[df.dtypes.eq('object')]