hive-1.2.1安装

论坛 期权论坛 编程之家     
选择匿名的用户   2021-5-26 11:21   11   0

Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。

Hive是由Facebook贡献给Apache的开源项目,这个工具可以说是完全为DBA而生,它的的目标,是希望能让精通SQL但不熟悉JAVA编程的工程师,在HADOOP的大潮中不至于下岗待业,即使完全不懂JAVA,也能在HDFS数据分析中继续发挥光和热。Hive是做什么呢,个人理解可以将其视为一个SQL语言的解释器,它能将DBA提交的SQL语句,转换成能够在HADOOP上执行的M-R作业,对于DBA或前端用户来说,不必再将精力花在编写M-R应用上,直接借助SQL的易用性来实现大规模数据的查询和分析。


与Hadoop类似,Hive也有三种运行模式:
1.内嵌模式:将元数据保存在本地内嵌的Derby数据库中,这得使用Hive最简单的方式,不过使用内嵌模式的话,缺点也比较明显,因为一个内嵌的Derby数据库每次只能访问一个数据文件,这也就意味着不支持多会话连接。这种情况应对本地测试可能都有所不足,仅供初学者熟悉应用Hive;
2.本地模式:这种模式是将元数据库保存在本地的独立数据库中(比如说MySQL),这样就能够支持多会话和多用户连接。——mysql可以是本地、也可以是远程
3.远程模式:hive服务和metastore在不同的进程内。


1、安装前准备:

安装好hadoop集群(hive1.2.1、hadoop2.5.2)


2、安装hive

1)下载hive包:

http://apache.fayea.com/hive/

2)解压:

tar -xvzf apache-hive-1.2.1-bin.tar.gz

3)创建环境变量:

#HADOOP VARIABLES START  
export HADOOP_HOME=/usr/local/hadoop-2.5.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
#HADOOP VARIABLES END

#hbase
export HBASE_HOME=/usr/local/hbase-1.0.0/
#hive
export Hive_HOME=/usr/local/hive/
export PATH=$PATH:$HBASE_HOME/bin:$HIVE_HOME/bin

4)内嵌模式:

该模式无需修改内容配置文件

# cp -p hive-default.xml.template hive-default.xml
# cp -p hive-default.xml.template hive-site.xml

在HDFS上建立/tmp和/user/hive/warehouse目录,并赋予组用户写权限。这是Hive默认的数据文件存放目录,在hive-site.xml文件中为默认配置。

# su - hadoop
$ hadoop dfs -mkdir /tmp
$ hadoop dfs -mkdir /user/hive/warehouse
$ hadoop dfs -chmod g+w /tmp
$ hadoop dfs -chmod g+w /user/hive/warehouse

$ hadoop dfs -ls /
drwxrwxr-x   - hadoop supergroup          0 2014-06-17 18:57 /tmp
drwxr-xr-x   - hadoop supergroup          0 2014-06-17 19:02 /user
drwxr-xr-x   - hadoop supergroup          0 2014-06-15 19:31 /usr
$ hadoop dfs -ls /user/hive/
drwxrwxr-x   - hadoop supergroup          0 2014-06-17 19:02 /user/hive/warehouse

 //启动
$ hive               
Logging initialized using configuration in file:/usr/hive/conf/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201406171916_734435947.txt
hive> show tables;
OK
Time taken: 7.157 seconds
hive> quit;

5)本地独立模式(在内嵌模式上配置)

安装、配置mysql:

# yum install mysql mysql-server          //安装mysql
# service mysqld start
# mysql -u root                           //添加数据库及用户
mysql> create database hive;      
Query OK, 1 row affected (0.00 sec)
mysql> grant all on hive.* to 'hive'@'localhost' identified by 'hive';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> \q
Bye

配置hive-site.xml文件,用于连接mysql

<property>
  <name>javax.jdo.option.ConnectionURL</name>         //所连接的MySQL数据库实例
  <value>jdbc:mysql://localhost:3306/hive</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>
 
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>  //连接的MySQL数据库驱动
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
 
<property>
  <name>javax.jdo.option.ConnectionUserName</name>   //连接的MySQL数据库用户名
  <value>hive</value>
  <description>username to use against metastore database</description>
</property>
 
<property>
  <name>javax.jdo.option.ConnectionPassword</name>    //连接的MySQL数据库密码
  <value>hive</value>
  <description>password to use against metastore database</description>
</property>

启动:

$ hive         //启动
Logging initialized using configuration in jar:file:/usr/hive/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201406172021_1374786590.txt
hive> show tables;
OK
Time taken: 5.527 seconds
hive> quit;

6)远程模式:

。。。


3、 hive启动报错: Found class jline.Terminal, but interface was expected

[ERROR] Terminal initialization failed; falling back to unsupported
Java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
        at jline.TerminalFactory.create(TerminalFactory.java:101)
        at jline.TerminalFactory.get(TerminalFactory.java:158)
        at jline.console.ConsoleReader.<init>(ConsoleReader.java:229)
        at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
        at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
        at org.apache.Hadoop.Hive.cli.CliDriver.getConsoleReader(CliDriver.java:773)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:715)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


原因:
hadoop目录下存在老版本jline:
/hadoop-2.5.2/share/hadoop/yarn/lib:
-rw-r--r-- 1 root root 87325 Mar 10 18:10 jline-0.9.94.jar
解决:
cp /hive/apache-hive-1.1.0-bin/lib/jline-2.12.jar /hadoop-2.5.2/share/hadoop/yarn/lib


4、测试:

1)创建表:

hive> create table test(id INT,str STRING)
    > row format delimited 
    > fields terminated by ','
    > stored as textfile;
Time taken: 0.15 seconds
hive> show tables;
OK
test
Time taken: 1.15 seconds

2)加载本地数据到hive:

hive> load data local inpath '/home/hadoop/data_test.txt'
    > overwrite into table test;                         
Copying data from file:/home/hadoop/data_test.txt
Copying file: file:/home/hadoop/data_test.txt
Loading data to table default.test
OK
Time taken: 4.322 seconds

3)查询前10行:

hive> select * from test limit 10;
OK
1    a
2    b
3    c
4    d
5    e
6    f
7    g
8    h
9    i
10   j
Time taken: 0.869 seconds

4)查询该文件中存在多少条数据,这时hive将执行一个map-reduce的过程来计算该值

hive> select count(1) from test;  
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201406180238_0001, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201406180238_0001
Kill Command = /usr/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=http://192.168.2.101:9001 -kill job_201406180238_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-06-18 02:39:43,858 Stage-1 map = 0%,  reduce = 0%
2014-06-18 02:39:54,964 Stage-1 map = 100%,  reduce = 0%
2014-06-18 02:40:04,078 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201406180238_0001
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   HDFS Read: 33586560 HDFS Write: 8 SUCESS
Total MapReduce CPU Time Spent: 0 msec
OK
4798080
Time taken: 35.687 seconds

参考:http://hatech.blog.51cto.com/8360868/1427748

分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:3875789
帖子:775174
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP