Oozie调度Pig job常见的问题及分析
guibin.beijing@gmail.com
1. Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exit code [7]
这个错误一开始让人一头雾水!查阅了"Programming Pig. 2011 version"才知道exit code[7]表示""ParseException thrown (can happen after parsing if variable substitution is being done)",这个解释依然让人费解。顺便在这里把所有的pig返回值都列出来,仅供参考。
Pig返回值及其意义:
返回值 | 意义 | 注释 |
---|
0 | 成功 | | 1 | Retriable failure | | 2 | Failure | | 3 | Partial failure | Used with multiquery; | 4 | Illegal arguments passed to Pig | | 5 | IOException thrown | Would usually be thrown by a UDF | 6 | PigException thrown | Usually means a Python UDF raised an exception | 7 | ParseException thrown | can happen after parsing if variable substitution is being done | 8 | Throwable thrown | an unexpected exception |
言归正传,既然这个错误依然费解,唯一的办法就是打开pig launchcer的job log,仔细琢磨一下,发现在日志中另有玄机:
Run pig script using PigRunner.run() for Pig version 0.8+
1332 [main] INFO org.apache.pig.Main - Logging error messages to: /hadoop/mapred/taskTracker/root/jobcache/job_201306261303_0109/attempt_201306261303_0109_m_000000_0/work/pig-job_201306261303_0109.log
Apache Pig version 0.8.1-cdh3u1 (rexported)
compiled Jul 18 2011, 08:29:40
USAGE: Pig [options] [-] : Run interactively in grunt shell.
Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
Pig [options] [-f[ile]] file : Run cmds found in file.
options include:
-4, -log4jconf - Log4j configuration file, overrides log conf
-b, -brief - Brief logging (no timestamps)
-c, -check - Syntax check
-d, -debug - Debug level, INFO is default
-e, -execute - Commands to execute (within quotes)
-f, -file - Path to the script to execute
-h, -help - Display this message. You can specify topic to get help for that topic.
properties is the only topic currently supported: -h properties.
-i, -version - Display version information
.......
非常奇怪,为什么pig的help会出现在日志中呢?原来,我们在oozie的配置中调用pig job action时,可能是愚蠢的oozie将错误的参数传递给了pig。
那么如何fix这个问题呢?
查阅你所使用的oozie的版本,根据Oozie定义的schema,修改pig action,最好根据spec提供的例子,不要以为config是XML就认为元素的位置无所谓,因为oozie实在太蠢,所以要按照oozie workflow spec中的例子,元素前后顺序一个都不能差,照葫芦画瓢写pig action。比如,我碰到的情况是,configuration如果放到了script之后,oozie就不认识,就报错。下面的oozie workflow pig action是准对
Oozie 2.3.2-cdh3u4一个可用的例子:
<workflow-app name="myAnalytic" xmlns="uri:oozie:workflow:0.2">
<start to="cleanupFailure" />
......
<action name="analytic_pig">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.launcher.mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
<property>
<name>pig.spill.extragc.size.threshold</name>
<value>100000000</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
<property>
<name>mapred.user.jobconf.limit</name>
<value>100000000</value>
</property>
</configuration>
<script>${script}</script>
<param>logType=${logType}</param>
<param>avro_schema=${avroSchema}</param>
<param>startDate=${startDate}</param>
<param>logsDirectory=${logsDirectory}</param>
<param>outputDirectory=${outputDirectory}</param>
<!-- common libraries, need to be abstracted out in future -->
<file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/json-simple-1.1.jar#json-simple-1.1.jar
</file>
<file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/avro-1.4.1.jar#avro-1.4.1.jar
</file>
<file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/pig-udfs.jar#pig-udfs.jar
</file>
<file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/piggybank.jar#piggybank.jar
</file>
</pig>
<ok to="unlock" />
<error to="unlockOnError" />
</action>
......
</workflow-app>
|