SparkSQL使用之Spark SQL CLI

Spark SQL CLI描述Spark SQL CLI的引入使得在SparkSQL中通过hive metastore就可以直接对hive进行查询更加方便；当前版本中还不能使用Spark SQL CLI与ThriftServer进行交互。使用Spark SQL CLI前需要注意：1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下；2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/Hadoop/software/mysql-connector-java-5.1.27-bin.jarSpark SQL CLI命令参数介绍：cd $SPARK_HOME/bin
spark-sql --helpUsage: ./bin/spark-sql [options] [cli option]
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally （"client"） or
on one of the worker machines inside the cluster （"cluster"）
（Default: client）.
--class CLASS_NAME Your application"s main class （for Java / Scala apps）.
--name NAME A name of your application.
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor. --conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf. --driver-memory MEM Memory for driver （e.g. 1000M, 2G）（Default: 512M）.
--driver-java-options Extra Java options to pass to the driver.
--driver-library-path Extra library path entries to pass to the driver.
--driver-class-path Extra class path entries to pass to the driver. Note that
jars added with --jars are automatically included in the
classpath. --executor-memory MEM Memory per executor （e.g. 1000M, 2G）（Default: 1G）. --help, -h Show this help message and exit
--verbose, -v Print additional debug output Spark standalone with cluster deploy mode only:
--driver-cores NUM Cores for driver （Default: 1）.
--supervise If given, restarts the driver on failure. Spark standalone and Mesos only:
--total-executor-cores NUM Total cores for all executors. YARN-only:
--executor-cores NUM Number of cores per executor （Default: 1）.
--queue QUEUE_NAME The YARN queue to submit to （Default: "default"）.
--num-executors NUM Number of executors to launch （Default: 2）.
--archives ARCHIVES Comma separated list of archives to be extracted into the
working directory of each executor.CLI options:
-d,--define <key=value> Variable subsitution to apply to hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-h <hostname> connecting to Hive Server on remote host
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable subsitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-p <port> connecting to Hive Server on port number
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode （echo executed SQL to the console）在启动spark-sql时，如果不指定master，则以local的方式运行，master既可以指定standalone的地址，也可以指定yarn；当设定master为yarn时（spark-sql --master yarn）时，可以通过http://hadoop000:8088页面监控到整个job的执行过程；注：如果在$SPARK_HOME/conf/spark-defaults.conf中配置了spark.master spark://hadoop000:7077，那么在启动spark-sql时不指定master也是运行在standalone集群之上。spark-sql使用启动spark-sql：由于我已经在spark-defaults.conf中配置了spark.master spark://hadoop000:7077，就没在spark-sql启动时指定master了cd $SPARK_HOME/bin
spark-sqlSELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;SELECT session_id, count（*） c FROM page_views group by session_id order by c desc limit 10;上面两个sql语句用到的表现在存在hive中了，如果没有则手工创建下，创建脚本以及导入数据脚本如下：create table page_views（
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string
）
ROW FORMAT DELIMITED FIELDS TERMINATED BY " ";load data local inpath "/home/spark/software/data/page_views.dat" overwrite into table page_views; 本文永久更新链接地址：http://www.linuxidc.com/Linux/2014-09/106620.htm