Hive 创建表

1. Hive 创建表时正则表达式与java代码中一致,RegexSerDe 中用\来表示反转码
create table inst_cn_3 （
ip string,
time string,
mac string,
lan string,
ver string,
lc string,
pn string,
reg string,
vrf string,
line string）
ROW FORMAT SERDE "org.apache.Hadoop.hive.contrib.serde2.RegexSerDe"
WITH SERDEPROPERTIES （
"input.regex" = "- *（[\d\.]+） *\[（[\d]{2}/[\w]+/[\d]{4}:[\d]{2}:[\d]{2}:[\d]{2}\s+\+[\d]+）\] *GET */mx3/inst/（[0-9a-f]{12}）/ver=（[\.\d]+）,lan=（0x[\w]+）（？:,lc=（[\w]+））（？:,pn=（[\w]+））（？:,reg=（[0-1]））（？:,vrf=（[\w]+））？.*"
）
STORED AS TEXTFILE;alter table inst_cn_3 add columns（line string）;
ALTER TABLE inst_cn_3 SET SERDEPROPERTIES （
"input.regex" = "- （[\d\.]+） \[（[\d]{2}/[\w]+/[\d]{4}:[\d]{2}:[\d]{2}:[\d]{2}\s+\+[\d]+）\] GET /mx3/inst/（[0-9a-f]{12}）/ver=（[\.\d]+）,lan=（0x[\w]+）（？:,lc=（[\w]+））（？:,pn=（[\w]+））（？:,reg=（[0-1]））（？:,vrf=（[\w]+））？.*|（.*）"
）;
select * from inst_cn_3 limit 100;select
line
from inst_cn_3
where
1=1
and mac is null
and line is not null
and ！（line rlike ".*unknowuser00.*"）
;hadoop fs -cp /mnt/nfs/log/statcn/inst/inst_cn_3.*.txt /hive/warehouse/inst_cn_3/

2011.06.09 hive时间处理
1.select
from_unixtime（unix_timestamp（"02/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）,"yyyy-MM-dd HH:mm:ss"） ,
from_unixtime（unix_timestamp（"02/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）,"yyyy-MM-dd HH:mm:ss"） ,
from_unixtime（unix_timestamp（"03/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）,"yyyy-MM-dd HH:mm:ss"） ,
from_unixtime（unix_timestamp（"03/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）,"yyyy-MM-dd HH:mm:ss"）
from
stat_c_log
where
1=1
and partkey = "20110503"
and logType = "inst_cn_3"
and url rlike "/mx3/inst/.*"
limit
10
;2.
select
from_unixtime（unix_timestamp（"02/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）,"yyyy-MM-dd HH:mm:ss"） ,
from_unixtime（unix_timestamp（"02/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）,"yyyy-MM-dd HH:mm:ss"） ,
from_unixtime（unix_timestamp（"03/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）,"yyyy-MM-dd HH:mm:ss"） ,
from_unixtime（unix_timestamp（"03/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）,"yyyy-MM-dd HH:mm:ss"） ,
round（（unix_timestamp（"02/May/2011:00:00:00 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）-4*3600）/（3600*24））*（3600*24*1000）,
round（（unix_timestamp（"02/May/2011:23:59:59 +0800","dd/MMMMM/yyyy:HH:mm:ss Z"）-4*3600）/（3600*24））*（3600*24*1000）
from
stat_c_log
where
1=1
and partkey = "20110503"
and logType = "inst_cn_3"
and url rlike "/mx3/inst/.*"
limit
10
;2012.03.01
1. Hive 方法注册类 FunctionRegistry
2012.06.14
1. set hive.cli.print.header=true; 可以设置hive shell的输出.
2012.06.26
1. hive cdh4b2 使用arichive 对表归档后，使用select line 对归档后的partition查询时，报FileNotFoundException 异常。
https://issues.apache.org/jira/browse/MAPREDUCE-2704 是因为CombineFileInputFormat constructs new Path objects by converting an existing path to a URI, and then only pulling out the "path" part of it. This drops the scheme and host, which makes CombineFileInputFormat fail if the paths are on a filesystem other than the default one.
2012.07.16
1. EXPLAIN EXTENDED hive_query; 查看运行.
2012.07.29
1. DESCRIBE FORMATTED mock; 显示columns, location, params等.
2. /src/ql/src/test/queries/clientpositive/Hive索引Oracle查看表结构的几种方法相关资讯 Hive

Hive 简明教程 PDF （今 09:40）
Apache Hive v2.1.0-rc1 发布下载（06月04日）
在 Apache Hive 中轻松生存的12个（04月07日）

Apache Hive v2.1.0 发布下载（06月22日）
SparkSQL读取Hive中的数据（05月20日）
Apache Hive 2.0.0 发布下载，数据（02月17日）

本文评论查看全部评论（0）