首页 / 操作系统 / Linux / SparkSQL使用之如何使用UDF
使用Java开发一个helloworld级别UDF,打包成udf.jar,存放在/home/Hadoop/lib下,代码如下:package com.luogankun.udf;import org.apache.hadoop.hive.ql.exec.UDF;public class HelloUDF extends UDF {public String evaluate(String str) {try {return "HelloWorld " + str;} catch (Exception e) {return null;}}} Hive中使用UDFcd $SPARK_HOME/binspark-sql --jars /home/hadoop/lib/udf.jarCREATE TEMPORARY FUNCTION hello AS "com.luogankun.udf.HelloUDF";
select hello(url) from page_views limit 1; SparkSQL中使用UDF方式一:在启动spark-sql时通过--jars指定cd $SPARK_HOME/binspark-sql --jars /home/hadoop/lib/udf.jarCREATE TEMPORARY FUNCTION hello AS "com.luogankun.udf.HelloUDF";
select hello(url) from page_views limit 1;方式二:先启动spark-sql后add jarcd $SPARK_HOME/binspark-sqladd jar /home/hadoop/lib/udf.jar;CREATE TEMPORARY FUNCTION hello AS "com.luogankun.udf.HelloUDF";
select hello(url) from page_views limit 1;在测试过程中发现并不支持该种方式,会报java.lang.ClassNotFoundException: com.luogankun.udf.HelloUDF如何解决? 1)需要先将udf.jar的路径配置到spark-env.sh的SPARK_CLASSPATH中,形如:export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.27-bin.jar:/home/hadoop/lib/udf.jar2)再启动spark-sql,直接CREATE TEMPORARY FUNCTION即可;cd $SPARK_HOME/binspark-sqlCREATE TEMPORARY FUNCTION hello AS "com.luogankun.udf.HelloUDF";
select hello(url) from page_views limit 1;方式三:Thrift JDBC Server中使用UDF在beeline命令行中执行:add jar /home/hadoop/lib/udf.jar;CREATE TEMPORARY FUNCTION hello AS "com.luogankun.udf.HelloUDF";
select hello(url) from page_views limit 1;Java编程思想(第4版) 中文清晰PDF完整版 http://www.linuxidc.com/Linux/2014-08/105403.htm编写高质量代码 改善Java程序的151个建议 PDF高清完整版 http://www.linuxidc.com/Linux/2014-06/103388.htmJava 8简明教程 http://www.linuxidc.com/Linux/2014-03/98754.htmJava对象初始化顺序的简单验证 http://www.linuxidc.com/Linux/2014-02/96220.htmJava对象值传递和对象传递的总结 http://www.linuxidc.com/Linux/2012-12/76692.htm本文永久更新链接地址:http://www.linuxidc.com/Linux/2014-09/106617.htm