Welcome 微信登录

首页 / 数据库 / MySQL / Hive 创建表

1. Hive 创建表时正则表达式与java代码中一致,RegexSerDe 中用\来表示反转码
create table inst_cn_3 (
  ip string,
  time string,
  mac string,
  lan string,
  ver string,
  lc string,
  pn string,
  reg string,
  vrf string,
  line string)
ROW FORMAT SERDE "org.apache.Hadoop.hive.contrib.serde2.RegexSerDe"
WITH SERDEPROPERTIES (
  "input.regex" = "- *([\d\.]+) *\[([\d]{2}/[\w]+/[\d]{4}:[\d]{2}:[\d]{2}:[\d]{2}\s+\+[\d]+)\] *GET */mx3/inst/([0-9a-f]{12})/ver=([\.\d]+),lan=(0x[\w]+)(?:,lc=([\w]+))(?:,pn=([\w]+))(?:,reg=([0-1]))(?:,vrf=([\w]+))?.*"
 )
STORED AS TEXTFILE;alter table inst_cn_3 add columns(line string);
ALTER TABLE inst_cn_3 SET SERDEPROPERTIES (
  "input.regex" = "- ([\d\.]+) \[([\d]{2}/[\w]+/[\d]{4}:[\d]{2}:[\d]{2}:[\d]{2}\s+\+[\d]+)\] GET /mx3/inst/([0-9a-f]{12})/ver=([\.\d]+),lan=(0x[\w]+)(?:,lc=([\w]+))(?:,pn=([\w]+))(?:,reg=([0-1]))(?:,vrf=([\w]+))?.*|(.*)"
 );
select * from inst_cn_3 limit 100;select
  line
from inst_cn_3
where
  1=1
 and mac is null
 and line is not null
 and  !(line rlike ".*unknowuser00.*")
  ;hadoop fs -cp /mnt/nfs/log/statcn/inst/inst_cn_3.*.txt /hive/warehouse/inst_cn_3/
 
2011.06.09 hive时间处理
1.select
  from_unixtime(unix_timestamp("02/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"),"yyyy-MM-dd HH:mm:ss") ,
 from_unixtime(unix_timestamp("02/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"),"yyyy-MM-dd HH:mm:ss")  ,
 from_unixtime(unix_timestamp("03/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"),"yyyy-MM-dd HH:mm:ss") ,
 from_unixtime(unix_timestamp("03/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"),"yyyy-MM-dd HH:mm:ss")
 from
  stat_c_log
where
  1=1
  and partkey = "20110503"
  and logType = "inst_cn_3"
  and url rlike "/mx3/inst/.*"
limit
  10
;2.
select
  from_unixtime(unix_timestamp("02/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"),"yyyy-MM-dd HH:mm:ss") ,
 from_unixtime(unix_timestamp("02/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"),"yyyy-MM-dd HH:mm:ss")  ,
 from_unixtime(unix_timestamp("03/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"),"yyyy-MM-dd HH:mm:ss") ,
 from_unixtime(unix_timestamp("03/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"),"yyyy-MM-dd HH:mm:ss")  ,
 round((unix_timestamp("02/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z")-4*3600)/(3600*24))*(3600*24*1000),
 round((unix_timestamp("02/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z")-4*3600)/(3600*24))*(3600*24*1000)
 from
  stat_c_log
where
  1=1
  and partkey = "20110503"
  and logType = "inst_cn_3"
  and url rlike "/mx3/inst/.*"
limit
  10
;2012.03.01
  1. Hive 方法注册类 FunctionRegistry
2012.06.14
  1. set hive.cli.print.header=true; 可以设置hive shell的输出.
2012.06.26
  1. hive cdh4b2 使用arichive 对表归档后, 使用select line 对归档后的partition查询时, 报FileNotFoundException 异常。
    https://issues.apache.org/jira/browse/MAPREDUCE-2704 是因为CombineFileInputFormat constructs new Path objects by converting an existing path to a URI, and then only pulling out the "path" part of it. This drops the scheme and host, which makes CombineFileInputFormat fail if the paths are on a filesystem other than the default one.
 2012.07.16
  1. EXPLAIN EXTENDED hive_query; 查看运行.
2012.07.29
  1. DESCRIBE FORMATTED mock; 显示columns, location, params等.
  2. /src/ql/src/test/queries/clientpositive/Hive索引Oracle查看表结构的几种方法相关资讯      Hive 
  • Hive 简明教程 PDF  (今 09:40)
  • Apache Hive v2.1.0-rc1 发布下载  (06月04日)
  • 在 Apache Hive 中轻松生存的12个  (04月07日)
  • Apache Hive v2.1.0 发布下载  (06月22日)
  • SparkSQL读取Hive中的数据  (05月20日)
  • Apache Hive 2.0.0 发布下载,数据  (02月17日)
本文评论 查看全部评论 (0)
表情: 姓名: 字数
<