spark错误 * Null value appeared in non-nullable fieldjava.lang.NullPointerException: Null value appeared in non-nullable field: top level row objectIf the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use
* Null value appeared in non-nullable field java.lang.NullPointerException: Null value appeared in non-nullable field: top level row object If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int). 解决:在dataframe中增加过滤row==null的Row df.filter(row -> row != null) * 编译问题,map修改row不生效: ERROR CodeGenerator: failed to compile: org.codehaus.janino.JaninoRuntimeException: Code of method "processNext()V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } ......(省略上万行源码) 原因:dataset中出现如下schema类型为null的字段(rank),发生原因是sql中使用了 null as rank语法。 |-- is_merchant_exclusive: integer (nullable = true) |-- comment_keywords: array (nullable = true) | |-- element: string (containsNull = true) |-- date: date (nullable = true) |-- generate_time: null (nullable = true) |-- rank: null (nullable = true) |-- prime: null (nullable = true) |-- activities: null (nullable = true) |-- categories: null (nullable = true) |-- total_heart_num: null (nullable = true) |-- ad_categories: null (nullable = true) 解决办法: 在dataset的map方法中,使用的schema必须先对上述null字段重新定义。 newFields.set(oldSchema.fieldIndex("rank"), staticSchema.apply("rank")); ... *spark保存数据到hive时,Caused by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead 原因:hive字段中存在map或array类型字段,但保存时,数据包含空array或空map的值。 解决办法:将空array或空map值(简称为空集合),修改为null,保存成功。 spark保存数据到hive时,不支持空集合,只能改为null再保存,但从数据文件导入到hive时则没有问题。 所以用spark读取hive时,会带入空集合数据,保存前需要改为null.