Welcome 微信登录
编程资源 图片资源库 蚂蚁家优选 PDF转换器

首页 / 操作系统 / Linux / SparkSQL使用之Thrift JDBC Server

Thrift JDBC Server描述Thrift JDBC Server使用的是HIVE0.12的HiveServer2实现。能够使用Spark或者hive0.12版本的beeline脚本与JDBC Server进行交互使用。Thrift JDBC Server默认监听端口是10000。使用Thrift JDBC Server前需要注意:1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下;2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/Hadoop/software/mysql-connector-java-5.1.27-bin.jarThrift JDBC Server命令使用帮助:cd $SPARK_HOME/sbin
start-thriftserver.sh --help
复制代码
Usage: ./sbin/start-thriftserver [options] [thrift server options]
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Options:
  --master MASTER_URL       spark://host:port, mesos://host:port, yarn, or local.
  --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
  --class CLASS_NAME          Your application"s main class (for Java / Scala apps).
  --name NAME               A name of your application.
  --jars JARS               Comma-separated list of local jars to include on the driver
                              and executor classpaths.
  --py-files PY_FILES       Comma-separated list of .zip, .egg, or .py files to place
                              on the PYTHONPATH for Python apps.
  --files FILES             Comma-separated list of files to be placed in the working
                              directory of each executor.  --conf PROP=VALUE         Arbitrary Spark configuration property.
  --properties-file FILE      Path to a file from which to load extra properties. If not
                              specified, this will look for conf/spark-defaults.conf.  --driver-memory MEM       Memory for driver (e.g. 1000M, 2G) (Default: 512M).
  --driver-java-options     Extra Java options to pass to the driver.
  --driver-library-path     Extra library path entries to pass to the driver.
  --driver-class-path       Extra class path entries to pass to the driver. Note that
                              jars added with --jars are automatically included in the
                              classpath.  --executor-memory MEM     Memory per executor (e.g. 1000M, 2G) (Default: 1G).  --help, -h                  Show this help message and exit
  --verbose, -v             Print additional debug output Spark standalone with cluster deploy mode only:
  --driver-cores NUM          Cores for driver (Default: 1).
  --supervise               If given, restarts the driver on failure. Spark standalone and Mesos only:
  --total-executor-cores NUM  Total cores for all executors. YARN-only:
  --executor-cores NUM        Number of cores per executor (Default: 1).
  --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").
  --num-executors NUM       Number of executors to launch (Default: 2).
  --archives ARCHIVES       Comma separated list of archives to be extracted into the
                              working directory of each executor.Thrift server options:
    --hiveconf <property=value> Use value for given propertymaster的描述与Spark SQL CLI一致beeline命令使用帮助:cd $SPARK_HOME/bin
beeline --helpUsage: java org.apache.hive.cli.beeline.BeeLine
 -u <database url>             the JDBC URL to connect to
 -n <username>                 the username to connect as
 -p <password>                 the password to connect as
 -d <driver class>             the driver class to use
 -e <query>                      query that should be executed
 -f <file>                     script file that should be executed
 --color=[true/false]            control whether color is used for display
 --showHeader=[true/false]     show column names in query results
 --headerInterval=ROWS;          the interval between which heades are displayed
 --fastConnect=[true/false]      skip building table/column list for tab-completion
 --autoCommit=[true/false]     enable/disable automatic transaction commit
 --verbose=[true/false]          show verbose error messages and debug info
 --showWarnings=[true/false]   display connection warnings
 --showNestedErrs=[true/false] display nested errors
 --numberFormat=[pattern]        format numbers using DecimalFormat pattern
 --force=[true/false]            continue running script even after errors
 --maxWidth=MAXWIDTH           the maximum width of the terminal
 --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
 --silent=[true/false]         be more silent
 --autosave=[true/false]       automatically save preferences
 --outputformat=[table/vertical/csv/tsv] format mode for result display
 --isolation=LEVEL             set the transaction isolation level
 --help                          display this messageThrift JDBC Server/beeline启动启动Thrift JDBC Server:默认端口是10000cd $SPARK_HOME/sbin
start-thriftserver.sh如何修改Thrift JDBC Server的默认监听端口号?借助于--hiveconfstart-thriftserver.sh  --hiveconf hive.server2.thrift.port=14000HiveServer2 Clients 详情参见:https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients启动beelinecd $SPARK_HOME/bin
beeline -u jdbc:hive2://hadoop000:10000/default -n hadoopsql脚本测试SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;
SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;本文永久更新链接地址:http://www.linuxidc.com/Linux/2014-09/106619.htm