Oozie调度Pig job常见的问题及分析

论坛 期权论坛 编程之家     
选择匿名的用户   2021-5-16 19:43   11   0

Oozie调度Pig job常见的问题及分析

guibin.beijing@gmail.com

1. Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exit code [7]

这个错误一开始让人一头雾水!查阅了"Programming Pig. 2011 version"才知道exit code[7]表示""ParseException thrown (can happen after parsing if variable substitution is being done)",这个解释依然让人费解。顺便在这里把所有的pig返回值都列出来,仅供参考。

Pig返回值及其意义:

返回值意义注释
0成功
1Retriable failure
2Failure
3Partial failureUsed with multiquery;
4Illegal arguments passed to Pig
5IOException thrownWould usually be thrown by a UDF
6PigException thrownUsually means a Python UDF raised an exception
7ParseException throwncan happen after parsing if variable substitution is being done
8Throwable thrownan unexpected exception

言归正传,既然这个错误依然费解,唯一的办法就是打开pig launchcer的job log,仔细琢磨一下,发现在日志中另有玄机:

Run pig script using PigRunner.run() for Pig version 0.8+
1332 [main] INFO  org.apache.pig.Main  - Logging error messages to: /hadoop/mapred/taskTracker/root/jobcache/job_201306261303_0109/attempt_201306261303_0109_m_000000_0/work/pig-job_201306261303_0109.log

Apache Pig version 0.8.1-cdh3u1 (rexported) 
compiled Jul 18 2011, 08:29:40

USAGE: Pig [options] [-] : Run interactively in grunt shell.
       Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
       Pig [options] [-f[ile]] file : Run cmds found in file.
  options include:
    -4, -log4jconf - Log4j configuration file, overrides log conf
    -b, -brief - Brief logging (no timestamps)
    -c, -check - Syntax check
    -d, -debug - Debug level, INFO is default
    -e, -execute - Commands to execute (within quotes)
    -f, -file - Path to the script to execute
    -h, -help - Display this message. You can specify topic to get help for that topic.
        properties is the only topic currently supported: -h properties.
    -i, -version - Display version information
.......

非常奇怪,为什么pig的help会出现在日志中呢?原来,我们在oozie的配置中调用pig job action时,可能是愚蠢的oozie将错误的参数传递给了pig。
那么如何fix这个问题呢?
查阅你所使用的oozie的版本,根据Oozie定义的schema,修改pig action,最好根据spec提供的例子,不要以为config是XML就认为元素的位置无所谓,因为oozie实在太蠢,所以要按照oozie workflow spec中的例子,元素前后顺序一个都不能差,照葫芦画瓢写pig action。比如,我碰到的情况是,configuration如果放到了script之后,oozie就不认识,就报错。下面的oozie workflow pig action是准对 Oozie 2.3.2-cdh3u4一个可用的例子:

<workflow-app name="myAnalytic" xmlns="uri:oozie:workflow:0.2">

 <start to="cleanupFailure" />

 ......

 <action name="analytic_pig">
  <pig>
   <job-tracker>${jobTracker}</job-tracker>
   <name-node>${nameNode}</name-node>
   <configuration>
    <property>
     <name>oozie.launcher.mapred.child.java.opts</name>
     <value>-Xmx2048m</value>
    </property>
    <property>
     <name>pig.spill.extragc.size.threshold</name>
     <value>100000000</value>
    </property>
    <property>
     <name>mapred.child.java.opts</name>
     <value>-Xmx2048m</value>
    </property>
    <property>
     <name>mapred.user.jobconf.limit</name>
     <value>100000000</value>
    </property>
   </configuration>
   
   <script>${script}</script>
   <param>logType=${logType}</param>
   <param>avro_schema=${avroSchema}</param>
   <param>startDate=${startDate}</param>
   <param>logsDirectory=${logsDirectory}</param>
   <param>outputDirectory=${outputDirectory}</param>
   
   <!-- common libraries, need to be abstracted out in future -->
   <file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/json-simple-1.1.jar#json-simple-1.1.jar
   </file>
   <file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/avro-1.4.1.jar#avro-1.4.1.jar
   </file>
   
   <file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/pig-udfs.jar#pig-udfs.jar
   </file>
   <file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/piggybank.jar#piggybank.jar
   </file>
   
  </pig>
  <ok to="unlock" />
  <error to="unlockOnError" />
 </action>
......
</workflow-app>






分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:3875789
帖子:775174
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP