pig-0.12.1 整合 hadoop-2.2.0 安装配置

下载

wget http://mirror.bit.edu.cn/apache/pig/pig-0.12.1/pig-0.12.1.tar.gz

解压

gunzip pig-0.12.1.tar.gz
tar xvf pig-0.12.1.tar -C /opt

注:

上述操作,将pig解压安装到/opt/pig-0.12.1目录下

由于pig官方发布的版本是基于hadoop 1.x 的所以我们需要自己编译基于hadoop-2.2.0的pig版本,否则在运行pig的时候将会报以下错误

ERROR org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl - Error while trying to run jobs. java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at

http://stackoverflow.com/questions/21300612/error-in-pig-while-loading-data

重新构建基于hadoop-2.2.0的pig版本

先看下目录结构

scott@master:/opt/pig-0.12.1$ pwd
/opt/pig-0.12.1
scott@master:/opt/pig-0.12.1$ ll
总用量 25732
drwxr-xr-x 15 scott scott 4096 4月 5 16:44 ./
drwxr-xr-x 23 scott scott 4096 4月 17 12:49 ../
drwxr-xr-x 2 scott scott 4096 4月 17 12:49 bin/
-rw-rw-r-- 1 scott scott 84778 4月 5 16:44 build.xml
-rw-rw-r-- 1 scott scott 148333 4月 5 16:44 CHANGES.txt
drwxr-xr-x 2 scott scott 4096 4月 17 12:49 conf/
drwxr-xr-x 4 scott scott 4096 4月 17 12:49 contrib/
drwxr-xr-x 6 scott scott 4096 4月 17 12:49 docs/
drwxr-xr-x 2 scott scott 4096 4月 17 12:49 ivy/
-rw-rw-r-- 1 scott scott 20846 4月 5 16:43 ivy.xml
drwxr-xr-x 3 scott scott 4096 4月 17 12:49 lib/
drwxr-xr-x 4 scott scott 4096 4月 5 16:44 lib-src/
drwxr-xr-x 2 scott scott 4096 4月 17 12:49 license/
-rw-rw-r-- 1 scott scott 11358 4月 5 16:44 LICENSE.txt
-rw-rw-r-- 1 scott scott 2120 4月 5 16:44 NOTICE.txt
-rw-rw-r-- 1 scott scott 17444256 4月 5 16:43 pig-0.12.1.jar
-rw-rw-r-- 1 scott scott 8554354 4月 5 16:43 pig-0.12.1-withouthadoop.jar
-rw-rw-r-- 1 scott scott 1307 4月 5 16:44 README.txt
-rw-rw-r-- 1 scott scott 1959 4月 5 16:44 RELEASE_NOTES.txt
drwxr-xr-x 2 scott scott 4096 4月 5 16:43 scripts/
drwxr-xr-x 4 scott scott 4096 4月 5 16:44 shims/
drwxr-xr-x 8 scott scott 4096 4月 17 12:49 src/
drwxr-xr-x 9 scott scott 4096 4月 17 12:49 test/
drwxr-xr-x 5 scott scott 4096 4月 17 12:49 tutorial/

网上爬文,得出以下解决方案

在pig解压后的根目录执行:

ant clean jar-all -Dhadoopversion=23

这里在执行之前我们先做一下配置,由于pig构建的时候使用的是apache ivy.而ivy最底层其实是依赖于maven的repo.这里ivy默认的maven仓库地址在天朝来说,那是相当的慢。还好osc提供了maven的镜像。这里我们就来配置使用oschina的maven镜像*

pig ivy 的配置目录在/opt/pig-0.12.1/ivy目录下,具体配置文件为ivysettings.xml,我们要做的就是修改ivysettings.xml文件添加osc的maven镜像地址*

scott@master:/opt/pig-0.12.1/ivy$ pwd
/opt/pig-0.12.1/ivy
scott@master:/opt/pig-0.12.1/ivy$ ll
总用量 964
drwxr-xr-x 2 scott scott 4096 4月 17 10:14 ./
drwxr-xr-x 16 scott scott 4096 4月 17 09:58 ../
-rw-rw-r-- 1 scott scott 947592 4月 5 16:43 ivy-2.2.0.jar
-rw-rw-r-- 1 scott scott 3708 4月 17 09:58 ivysettings.xml
-rw-rw-r-- 1 scott scott 2589 4月 5 16:43 libraries.properties
-rw-rw-r-- 1 scott scott 3511 4月 5 16:43 piggybank-template.xml
-rw-rw-r-- 1 scott scott 1685 4月 5 16:43 pigsmoke-template.xml
-rw-rw-r-- 1 scott scott 5013 4月 5 16:43 pig-template.xml
-rw-rw-r-- 1 scott scott 2115 4月 5 16:43 pigunit-template.xml

添加oschina的maven地址

在<resolvers><resolvers/>标签内添加
<ibiblio name=”maven-osc” root=”http://maven.oschina.net/content/groups/public/“ pattern=”${maven2.pattern.ext}” m2compatible=”true”/>
<chain name=”external” dual=”true”></chain>标签内添加<resolver ref=”maven-osc”/>并且把<resolver ref=”maven-osc”/>放在最前面,优先使用该地址

添加后的ivysettings.xml

<property name="repo.maven.org" value="${mvnrepo}" override="true"/>
<property name="repo.jboss.org" value="http://repository.jboss.com/nexus/content/groups/public/" override="false"/>
<property name="repo.apache.snapshots" value="http://repository.apache.org/content/groups/snapshots-group/" override="false"/>
<property name="repo.dir" value="${user.home}/.m2/repository" override="false"/>
<property name="maven2.pattern" value="[organisation]/[module]/[revision]/[module]-[revision](-[classifier])"/>
<property name="maven2.pattern.ext" value="${maven2.pattern}.[ext]"/>
<property name="snapshot.pattern" value="[organisation]/[module]/[revision]/[artifact]-[revision](-[classifier]).[ext]"/>
<property name="resolvers" value="default" override="false"/>
<property name="force-resolve" value="false" override="false"/>
<!-- pull in the local repository -->
<include url="${ivy.default.conf.dir}/ivyconf-local.xml"/>
<settings defaultResolver="${resolvers}"/>
<resolvers>
<ibiblio name="maven-osc" root="http://maven.oschina.net/content/groups/public/" pattern="${maven2.pattern.ext}" m2compatible="true"/>
<ibiblio name="maven2" root="${repo.maven.org}" pattern="${maven2.pattern.ext}" m2compatible="true"/>
<ibiblio name="jboss-maven2" root="${repo.jboss.org}" pattern="${maven2.pattern.ext}" m2compatible="true"/>
<ibiblio name="apache-snapshots" root="${repo.apache.snapshots}" pattern="${snapshot.pattern}"
checkmodified="true" changingPattern=".*SNAPSHOT" m2compatible="true"/>

<filesystem name="fs" m2compatible="true" checkconsistency="false" force="${force-resolve}"
checkmodified="true" changingPattern=".*SNAPSHOT">

<artifact pattern="${repo.dir}/${maven2.pattern.ext}"/>
<ivy pattern="${repo.dir}/[organisation]/[module]/[revision]/[module]-[revision].pom"/>
</filesystem>
<chain name="internal" checkmodified="true">
<resolver ref="fs"/>
</chain>
<chain name="external" dual="true">
<resolver ref="maven-osc"/>
<resolver ref="maven2"/>
<resolver ref="jboss-maven2"/>
<resolver ref="apache-snapshots"/>
</chain>
<chain name="default" dual="true" checkmodified="true">
<resolver ref="internal"/>
<resolver ref="external"/>
</chain>
</resolvers>
<modules>
<module organisation="org.apache.pig" name=".*" resolver="internal"/>
</modules>

构建

在/opt/pig-0.12.1(pig安装的根目录下)执行以下命令

ant clean jar-all -Dhadoopversion=23

接下来就是漫长的等待啦

scott@master:/opt/pig-0.12.1$ ant clean jar-all -Dhadoopversion=23
Buildfile: /opt/pig-0.12.1/build.xml

clean:
[delete] Deleting directory /opt/pig-0.12.1/src/docs/build
[delete] Deleting directory /opt/pig-0.12.1/test/org/apache/pig/test/utils/dotGraph/parser

clean:

clean:

ivy-download:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
[get] To: /opt/pig-0.12.1/ivy/ivy-2.2.0.jar
[get] Not modified - so not downloaded

ivy-init-dirs:
[mkdir] Created dir: /opt/pig-0.12.1/build/ivy
[mkdir] Created dir: /opt/pig-0.12.1/build/ivy/lib
[mkdir] Created dir: /opt/pig-0.12.1/build/ivy/report
[mkdir] Created dir: /opt/pig-0.12.1/build/ivy/maven

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = /opt/pig-0.12.1/ivy/ivysettings.xml

ivy-resolve:
[ivy:resolve] :: resolving dependencies :: org.apache.pig#pig;0.12.2-SNAPSHOT
[ivy:resolve] confs: [master, default, runtime, compile, test, javadoc, releaseaudit, jdiff, checkstyle, buildJar, hadoop20, hadoop23, hbase94, hbase95]
[ivy:resolve] found xmlenc#xmlenc;0.52 in fs
[ivy:resolve] found com.sun.jersey#jersey-bundle;1.8 in maven2
[ivy:resolve] found com.sun.jersey#jersey-server;1.8 in fs
[ivy:resolve] found com.sun.jersey.contribs#jersey-guice;1.8 in fs
[ivy:resolve] found commons-codec#commons-codec;1.4 in fs
[ivy:resolve] found commons-httpclient#commons-httpclient;3.1 in fs
[ivy:resolve] found commons-configuration#commons-configuration;1.6 in fs
[ivy:resolve] found commons-collections#commons-collections;3.2.1 in fs
[ivy:resolve] found javax.servlet#servlet-api;2.5 in fs
[ivy:resolve] found javax.ws.rs#jsr311-api;1.1.1 in maven2
[ivy:resolve] found org.mortbay.jetty#jetty;6.1.26 in fs
[ivy:resolve] found com.google.protobuf#protobuf-java;2.4.0a in fs
[ivy:resolve] found org.mortbay.jetty#jetty-util;6.1.26 in fs
[ivy:resolve] found javax.inject#javax.inject;1 in fs
[ivy:resolve] found javax.xml.bind#jaxb-api;2.2.2 in fs
[ivy:resolve] found com.sun.xml.bind#jaxb-impl;2.2.3-1 in fs
[ivy:resolve] found com.google.inject#guice;3.0 in fs
[ivy:resolve] found com.google.inject.extensions#guice-servlet;3.0 in fs
[ivy:resolve] found aopalliance#aopalliance;1.0 in fs
[ivy:resolve] found org.mortbay.jetty#jsp-2.1;6.1.14 in fs
[ivy:resolve] found org.mortbay.jetty#jsp-api-2.1;6.1.14 in fs
[ivy:resolve] found org.apache.hadoop#hadoop-annotations;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-auth;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-common;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-hdfs;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-mapreduce-client-core;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-yarn-server-tests;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-mapreduce-client-app;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-mapreduce-client-shuffle;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-mapreduce-client-common;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-yarn-api;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-yarn-common;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-yarn-server;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-yarn-server-web-proxy;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-yarn-server-common;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-yarn-server-nodemanager;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-yarn-server-resourcemanager;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-yarn-client;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hadoop#hadoop-mapreduce-client-hs;2.0.3-alpha in maven2
[ivy:resolve] found org.apache.hbase#hbase;0.94.1 in maven2
[ivy:resolve] found commons-el#commons-el;1.0 in fs
[ivy:resolve] found commons-io#commons-io;2.3 in maven2
[ivy:resolve] found org.apache.httpcomponents#httpclient;4.1 in maven2
[ivy:resolve] found org.apache.httpcomponents#httpcore;4.1 in maven2
[ivy:resolve] found log4j#log4j;1.2.16 in fs
[ivy:resolve] found commons-logging#commons-logging;1.1.1 in fs
[ivy:resolve] found org.slf4j#slf4j-log4j12;1.6.1 in maven2
[ivy:resolve] found commons-cli#commons-cli;1.0 in fs
[ivy:resolve] found org.apache.avro#avro;1.7.4 in fs
[ivy:resolve] found org.codehaus.jackson#jackson-core-asl;1.8.8 in fs
[ivy:resolve] found org.codehaus.jackson#jackson-mapper-asl;1.8.8 in fs
[ivy:resolve] found com.thoughtworks.paranamer#paranamer;2.3 in fs
[ivy:resolve] found org.xerial.snappy#snappy-java;1.0.4.1 in fs
[ivy:resolve] found org.apache.commons#commons-compress;1.4.1 in fs
[ivy:resolve] found org.tukaani#xz;1.0 in fs
[ivy:resolve] found org.slf4j#slf4j-api;1.6.4 in fs
[ivy:resolve] found org.apache.avro#avro-mapred;1.7.4 in maven2
[ivy:resolve] found org.apache.avro#avro-ipc;1.7.4 in maven2
[ivy:resolve] found org.mortbay.jetty#servlet-api;2.5-20081211 in fs
[ivy:resolve] found io.netty#netty;3.4.0.Final in fs
[ivy:resolve] found org.apache.velocity#velocity;1.7 in fs
[ivy:resolve] found commons-lang#commons-lang;2.4 in fs
[ivy:resolve] found org.apache.avro#trevni-core;1.7.4 in maven2
[ivy:resolve] found org.apache.avro#trevni-avro;1.7.4 in maven2
[ivy:resolve] found org.xerial.snappy#snappy-java;1.1.0.1 in maven2
[ivy:resolve] found com.googlecode.json-simple#json-simple;1.1 in fs
[ivy:resolve] found com.jcraft#jsch;0.1.38 in fs
[ivy:resolve] found jline#jline;0.9.94 in fs
[ivy:resolve] found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve] found org.codehaus.groovy#groovy-all;1.8.6 in maven2
[ivy:resolve] found org.fusesource.jansi#jansi;1.9 in maven2
[ivy:resolve] found joda-time#joda-time;2.1 in maven2
[ivy:resolve] found com.google.guava#guava;11.0 in maven2
[ivy:resolve] found org.python#jython-standalone;2.5.3 in maven2
[ivy:resolve] found rhino#js;1.7R2 in maven2
[ivy:resolve] found org.antlr#antlr;3.4 in maven2
[ivy:resolve] found org.antlr#antlr-runtime;3.4 in fs
[ivy:resolve] found org.antlr#stringtemplate;3.2.1 in fs
[ivy:resolve] found antlr#antlr;2.7.7 in fs
[ivy:resolve] found org.antlr#ST4;4.0.4 in maven2
[ivy:resolve] found org.apache.zookeeper#zookeeper;3.4.4 in maven2
[ivy:resolve] found dk.brics.automaton#automaton;1.11-8 in maven2
[ivy:resolve] found org.jruby#jruby-complete;1.6.7 in maven2
[ivy:resolve] found asm#asm;3.3.1 in fs
[ivy:resolve] found org.vafer#jdeb;0.8 in maven2
[ivy:resolve] found org.mockito#mockito-all;1.8.4 in maven2
[ivy:resolve] found com.twitter#parquet-pig-bundle;1.2.3 in maven2
[ivy:resolve] found org.apache.avro#avro-tools;1.7.4 in maven2
[ivy:resolve] found xalan#xalan;2.7.1 in maven2
[ivy:resolve] found xalan#serializer;2.7.1 in maven2
[ivy:resolve] found xml-apis#xml-apis;1.3.04 in fs
[ivy:resolve] found xerces#xercesImpl;2.10.0 in fs
[ivy:resolve] found xml-apis#xml-apis;1.4.01 in fs
[ivy:resolve] found junit#junit;4.11 in fs
[ivy:resolve] found org.jboss.netty#netty;3.2.2.Final in fs
[ivy:resolve] found com.github.stephenc.high-scale-lib#high-scale-lib;1.1.1 in maven2
[ivy:resolve] found com.yammer.metrics#metrics-core;2.1.2 in maven2
[ivy:resolve] found hsqldb#hsqldb;1.8.0.10 in maven2
[ivy:resolve] found org.apache.hive#hive-exec;0.8.0 in maven2
[ivy:resolve] found com.google.code.p.arat#rat-lib;0.5.1 in maven2
[ivy:resolve] found commons-collections#commons-collections;3.2 in fs
[ivy:resolve] found commons-lang#commons-lang;2.1 in fs
[ivy:resolve] found jdiff#jdiff;1.0.9 in fs
[ivy:resolve] found checkstyle#checkstyle;4.2 in maven2
[ivy:resolve] found commons-beanutils#commons-beanutils-core;1.7.0 in maven2
[ivy:resolve] found com.sun.jersey#jersey-core;1.8 in fs
[ivy:resolve] found org.apache.hadoop#hadoop-core;1.0.0 in maven-osc
[ivy:resolve] found commons-cli#commons-cli;1.2 in fs
[ivy:resolve] found commons-httpclient#commons-httpclient;3.0.1 in maven-osc
[ivy:resolve] found junit#junit;3.8.1 in fs
[ivy:resolve] found commons-logging#commons-logging;1.0.3 in fs
[ivy:resolve] found org.apache.commons#commons-math;2.1 in fs
[ivy:resolve] found commons-digester#commons-digester;1.8 in fs
[ivy:resolve] found commons-beanutils#commons-beanutils;1.7.0 in fs
[ivy:resolve] found commons-beanutils#commons-beanutils-core;1.8.0 in fs
[ivy:resolve] found commons-net#commons-net;1.4.1 in fs
[ivy:resolve] found oro#oro;2.0.8 in fs
[ivy:resolve] found tomcat#jasper-runtime;5.5.12 in fs
[ivy:resolve] found tomcat#jasper-compiler;5.5.12 in fs
[ivy:resolve] found org.mortbay.jetty#servlet-api-2.5;6.1.14 in fs
[ivy:resolve] found org.eclipse.jdt#core;3.1.1 in fs
[ivy:resolve] found ant#ant;1.6.5 in fs
[ivy:resolve] found net.java.dev.jets3t#jets3t;0.7.1 in maven-osc
[ivy:resolve] found net.sf.kosmosfs#kfs;0.3 in fs
[ivy:resolve] found org.codehaus.jackson#jackson-mapper-asl;1.0.1 in maven-osc
[ivy:resolve] found org.codehaus.jackson#jackson-core-asl;1.0.1 in maven-osc
[ivy:resolve] found org.apache.hadoop#hadoop-test;1.0.0 in maven-osc
[ivy:resolve] found org.apache.ftpserver#ftplet-api;1.0.0 in fs
[ivy:resolve] found org.apache.mina#mina-core;2.0.0-M5 in fs
[ivy:resolve] found org.slf4j#slf4j-api;1.5.2 in maven-osc
[ivy:resolve] found org.apache.ftpserver#ftpserver-core;1.0.0 in fs
[ivy:resolve] found org.apache.ftpserver#ftpserver-deprecated;1.0.0-M2 in fs
[ivy:resolve] found org.apache.hbase#hbase-client;0.96.0-hadoop1 in maven-osc
[ivy:resolve] found org.apache.hbase#hbase-common;0.96.0-hadoop1 in maven-osc
[ivy:resolve] found org.apache.hbase#hbase-server;0.96.0-hadoop1 in maven-osc
[ivy:resolve] found org.apache.hbase#hbase-protocol;0.96.0-hadoop1 in maven-osc
[ivy:resolve] found org.apache.hbase#hbase-hadoop-compat;0.96.0-hadoop1 in maven-osc
[ivy:resolve] found org.apache.hbase#hbase-hadoop1-compat;0.96.0-hadoop1 in maven-osc
[ivy:resolve] found org.cloudera.htrace#htrace-core;2.00 in maven-osc
[ivy:resolve] :: resolution report :: resolve 18354ms :: artifacts dl 343ms
[ivy:resolve] :: evicted modules:
[ivy:resolve] org.xerial.snappy#snappy-java;1.0.4.1 by [org.xerial.snappy#snappy-java;1.1.0.1] in [default, test, compile, runtime, javadoc, buildJar]
[ivy:resolve] org.antlr#antlr-runtime;3.3 by [org.antlr#antlr-runtime;3.4] in [default, test, compile, runtime, javadoc, buildJar]
[ivy:resolve] xml-apis#xml-apis;1.3.04 by [xml-apis#xml-apis;1.4.01] in [default, test, runtime, javadoc, buildJar]
[ivy:resolve] commons-logging#commons-logging;1.0.3 by [commons-logging#commons-logging;1.1.1] in [hadoop20]
[ivy:resolve] commons-codec#commons-codec;1.2 by [commons-codec#commons-codec;1.4] in [hadoop20]
[ivy:resolve] commons-logging#commons-logging;1.1 by [commons-logging#commons-logging;1.1.1] in [hadoop20]
[ivy:resolve] commons-codec#commons-codec;1.3 by [commons-codec#commons-codec;1.4] in [hadoop20]
[ivy:resolve] commons-httpclient#commons-httpclient;3.1 by [commons-httpclient#commons-httpclient;3.0.1] in [hadoop20]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M4 by [org.apache.mina#mina-core;2.0.0-M5] in [hadoop20]
[ivy:resolve] org.apache.ftpserver#ftplet-api;1.0.0-M2 by [org.apache.ftpserver#ftplet-api;1.0.0] in [hadoop20]
[ivy:resolve] org.apache.ftpserver#ftpserver-core;1.0.0-M2 by [org.apache.ftpserver#ftpserver-core;1.0.0] in [hadoop20]
[ivy:resolve] org.apache.mina#mina-core;2.0.0-M2 by [org.apache.mina#mina-core;2.0.0-M5] in [hadoop20]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| master | 0 | 0 | 0 | 0 || 0 | 0 |
| default | 101 | 47 | 0 | 3 || 101 | 0 |
| runtime | 101 | 47 | 0 | 3 || 101 | 0 |
| compile | 89 | 42 | 0 | 2 || 90 | 0 |
| test | 101 | 47 | 0 | 3 || 101 | 0 |
| javadoc | 101 | 47 | 0 | 3 || 101 | 0 |
| releaseaudit | 3 | 2 | 0 | 0 || 3 | 0 |
| jdiff | 3 | 3 | 0 | 0 || 3 | 0 |
| checkstyle | 10 | 4 | 0 | 0 || 10 | 0 |
| buildJar | 101 | 47 | 0 | 3 || 101 | 0 |
| hadoop20 | 48 | 33 | 0 | 9 || 39 | 0 |
| hadoop23 | 40 | 19 | 0 | 0 || 42 | 0 |
| hbase94 | 1 | 0 | 0 | 0 || 2 | 0 |
| hbase95 | 7 | 0 | 0 | 0 || 13 | 0 |
---------------------------------------------------------------------

ivy-compile:
[ivy:retrieve] :: retrieving :: org.apache.pig#pig
[ivy:retrieve] confs: [compile]
[ivy:retrieve] 90 artifacts copied, 0 already retrieved (80170kB/2196ms)
[ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' instead
[ivy:cachepath] :: loading settings :: file = /opt/pig-0.12.1/ivy/ivysettings.xml

init:
[mkdir] Created dir: /opt/pig-0.12.1/src-gen/org/apache/pig/impl/logicalLayer/parser
[mkdir] Created dir: /opt/pig-0.12.1/src-gen/org/apache/pig/tools/pigscript/parser
[mkdir] Created dir: /opt/pig-0.12.1/src-gen/org/apache/pig/tools/parameters
[mkdir] Created dir: /opt/pig-0.12.1/build/classes
[mkdir] Created dir: /opt/pig-0.12.1/build/test/classes
[mkdir] Created dir: /opt/pig-0.12.1/test/org/apache/pig/test/utils/dotGraph/parser
[mkdir] Created dir: /opt/pig-0.12.1/src-gen/org/apache/pig/data/parser
[move] Moving 1 file to /opt/pig-0.12.1/build/ivy/lib/Pig
[exec] Execute failed: java.io.IOException: Cannot run program "svnversion": error=2, 没有那个文件或目录

cc-compile:
[javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
[javacc] (type "javacc" with no arguments for help)
[javacc] Reading from file /opt/pig-0.12.1/src/org/apache/pig/tools/pigscript/parser/PigScriptParser.jj . . .
[javacc] File "TokenMgrError.java" does not exist. Will create one.
[javacc] File "ParseException.java" does not exist. Will create one.
[javacc] File "Token.java" does not exist. Will create one.
[javacc] File "JavaCharStream.java" does not exist. Will create one.
[javacc] Parser generated successfully.
[javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
[javacc] (type "javacc" with no arguments for help)
[javacc] Reading from file /opt/pig-0.12.1/src/org/apache/pig/tools/parameters/PigFileParser.jj . . .
[javacc] Warning: Lookahead adequacy checking not being performed since option LOOKAHEAD is more than 1. Set option FORCE_LA_CHECK to true to force checking.
[javacc] File "TokenMgrError.java" does not exist. Will create one.
[javacc] File "ParseException.java" does not exist. Will create one.
[javacc] File "Token.java" does not exist. Will create one.
[javacc] File "JavaCharStream.java" does not exist. Will create one.
[javacc] Parser generated with 0 errors and 1 warnings.
[javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
[javacc] (type "javacc" with no arguments for help)
[javacc] Reading from file /opt/pig-0.12.1/src/org/apache/pig/tools/parameters/ParamLoader.jj . . .
[javacc] File "TokenMgrError.java" is being rebuilt.
[javacc] File "ParseException.java" is being rebuilt.
[javacc] File "Token.java" is being rebuilt.
[javacc] File "JavaCharStream.java" is being rebuilt.
[javacc] Parser generated successfully.
[jjtree] Java Compiler Compiler Version 4.2 (Tree Builder)
[jjtree] (type "jjtree" with no arguments for help)
[jjtree] Reading from file /opt/pig-0.12.1/test/org/apache/pig/test/utils/dotGraph/DOTParser.jjt . . .
[jjtree] File "Node.java" does not exist. Will create one.
[jjtree] File "SimpleNode.java" does not exist. Will create one.
[jjtree] File "DOTParserTreeConstants.java" does not exist. Will create one.
[jjtree] File "JJTDOTParserState.java" does not exist. Will create one.
[jjtree] Annotated grammar generated successfully in /opt/pig-0.12.1/test/org/apache/pig/test/utils/dotGraph/parser/DOTParser.jj
[javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
[javacc] (type "javacc" with no arguments for help)
[javacc] Reading from file /opt/pig-0.12.1/test/org/apache/pig/test/utils/dotGraph/parser/DOTParser.jj . . .
[javacc] File "TokenMgrError.java" does not exist. Will create one.
[javacc] File "ParseException.java" does not exist. Will create one.
[javacc] File "Token.java" does not exist. Will create one.
[javacc] File "SimpleCharStream.java" does not exist. Will create one.
[javacc] Parser generated successfully.

prepare:
[mkdir] Created dir: /opt/pig-0.12.1/src-gen/org/apache/pig/parser

genLexer:

genParser:

genTreeParser:

gen:

compile:
[echo] *** Building Main Sources ***
[echo] *** To compile with all warnings enabled, supply -Dall.warnings=1 on command line ***
[echo] *** Else, you will only be warned about deprecations ***
[javac] Compiling 844 source files to /opt/pig-0.12.1/build/classes
[javac] 警告: [options] 未与 -source 1.5 一起设置引导类路径
[javac] 注: 某些输入文件使用或覆盖了已过时的 API。
[javac] 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
[javac] 注: 某些输入文件使用了未经检查或不安全的操作。
[javac] 注: 有关详细信息, 请使用 -Xlint:unchecked 重新编译。
[javac] 1 个警告
[copy] Copying 1 file to /opt/pig-0.12.1/build/classes/org/apache/pig/tools/grunt
[copy] Copying 1 file to /opt/pig-0.12.1/build/classes/org/apache/pig/tools/grunt
[copy] Copying 2 files to /opt/pig-0.12.1/build/classes/python

ivy-buildJar:
[ivy:retrieve] :: retrieving :: org.apache.pig#pig
[ivy:retrieve] confs: [buildJar]
[ivy:retrieve] 12 artifacts copied, 89 already retrieved (22086kB/465ms)

jar:
[echo] svnString : unknown
[jar] Building jar: /opt/pig-0.12.1/build/pig-0.12.2-SNAPSHOT.jar
[echo] svnString : unknown
[jar] Building jar: /opt/pig-0.12.1/build/pig-0.12.2-SNAPSHOT-withdependencies.jar
[copy] Copying 1 file to /opt/pig-0.12.1

include-meta:
[copy] Copying 1 file to /opt/pig-0.12.1/build/classes/META-INF
[move] Moving 1 file to /opt/pig-0.12.1/build
[jar] Building jar: /opt/pig-0.12.1/build/pig-0.12.2-SNAPSHOT-withdependencies.jar

jar-withouthadoop:
[echo] svnString : unknown
[jar] Building jar: /opt/pig-0.12.1/build/pig-0.12.2-SNAPSHOT.jar
[echo] svnString : unknown
[jar] Building jar: /opt/pig-0.12.1/build/pig-0.12.2-SNAPSHOT-withouthadoop.jar
[copy] Copying 1 file to /opt/pig-0.12.1

jar-all:

BUILD SUCCESSFUL
Total time: 2 minutes 9 seconds

经过漫长的编译后最终编译好的pig在/opt/pig-0.12.1/build目录下

该目录具体内容为:

scott@master:/opt/pig-0.12.1$ cd build/
scott@master:/opt/pig-0.12.1/build$ pwd
/opt/pig-0.12.1/build
scott@master:/opt/pig-0.12.1/build$ ll
总用量 51776
drwxrwxr-x 5 scott scott 4096 4月 17 13:02 ./
drwxr-xr-x 17 scott scott 4096 4月 17 13:02 ../
drwxrwxr-x 5 scott scott 4096 4月 17 13:01 classes/
drwxrwxr-x 5 scott scott 4096 4月 17 13:00 ivy/
-rw-rw-r-- 1 scott scott 3658637 4月 17 13:02 pig-0.12.2-SNAPSHOT.jar
-rw-rw-r-- 1 scott scott 20346853 4月 17 13:02 pig-0.12.2-SNAPSHOT-withdependencies.jar
-rw-rw-r-- 1 scott scott 20418810 4月 17 13:01 pig-0.12.2-SNAPSHOT-withdependencies.stage.jar
-rw-rw-r-- 1 scott scott 8563748 4月 17 13:02 pig-0.12.2-SNAPSHOT-withouthadoop.jar
drwxrwxr-x 3 scott scott 4096 4月 17 13:00 test/

同时发现在pig的根目录下多了两个jar文件

-rw-rw-r--  1 scott scott 20418810  4月 17 13:01 pig.jar
-rw-rw-r-- 1 scott scott 8563748 4月 17 13:02 pig-withouthadoop.jar

移除或备份原来的pig-0.12.1.jar、pig-0.12.1-withouthadoop.jar

我这里暂且备份之前的jar。不做移除操作,以备hadoop-1.x使用。这里之所以要将之前的jar删除或改为其他文件,相信搞java的都懂,注意是防止jar包冲突。

scott@master:/opt/pig-0.12.1$ mv pig-0.12.1.jar pig-0.12.1.jar.default
scott@master:/opt/pig-0.12.1$ mv pig-0.12.1-withouthadoop.jar pig-0.12.1-withouthadoop.jar.default
scott@master:/opt/pig-0.12.1$ ll
总用量 54048
drwxr-xr-x 17 scott scott 4096 4月 17 13:08 ./
drwxr-xr-x 23 scott scott 4096 4月 17 12:49 ../
drwxr-xr-x 2 scott scott 4096 4月 17 12:49 bin/
drwxrwxr-x 5 scott scott 4096 4月 17 13:02 build/
-rw-rw-r-- 1 scott scott 84778 4月 5 16:44 build.xml
-rw-rw-r-- 1 scott scott 148333 4月 5 16:44 CHANGES.txt
drwxr-xr-x 2 scott scott 4096 4月 17 12:49 conf/
drwxr-xr-x 4 scott scott 4096 4月 17 12:49 contrib/
drwxr-xr-x 6 scott scott 4096 4月 17 12:49 docs/
drwxr-xr-x 2 scott scott 4096 4月 17 12:59 ivy/
-rw-rw-r-- 1 scott scott 20846 4月 5 16:43 ivy.xml
drwxr-xr-x 3 scott scott 4096 4月 17 12:49 lib/
drwxr-xr-x 4 scott scott 4096 4月 5 16:44 lib-src/
drwxr-xr-x 2 scott scott 4096 4月 17 12:49 license/
-rw-rw-r-- 1 scott scott 11358 4月 5 16:44 LICENSE.txt
-rw-rw-r-- 1 scott scott 2120 4月 5 16:44 NOTICE.txt
-rw-rw-r-- 1 scott scott 17444256 4月 5 16:43 pig-0.12.1.jar.default
-rw-rw-r-- 1 scott scott 8554354 4月 5 16:43 pig-0.12.1-withouthadoop.jar.default
-rw-rw-r-- 1 scott scott 20418810 4月 17 13:01 pig.jar
-rw-rw-r-- 1 scott scott 8563748 4月 17 13:02 pig-withouthadoop.jar
-rw-rw-r-- 1 scott scott 1307 4月 5 16:44 README.txt
-rw-rw-r-- 1 scott scott 1959 4月 5 16:44 RELEASE_NOTES.txt
drwxr-xr-x 2 scott scott 4096 4月 5 16:43 scripts/
drwxr-xr-x 4 scott scott 4096 4月 5 16:44 shims/
drwxr-xr-x 8 scott scott 4096 4月 17 12:49 src/
drwxrwxr-x 3 scott scott 4096 4月 17 13:00 src-gen/
drwxr-xr-x 9 scott scott 4096 4月 17 12:49 test/
drwxr-xr-x 5 scott scott 4096 4月 17 12:49 tutorial/

修改在ant构建过程中拷贝过来的pig.jar、pig-withouthadoop.jar文件名称

scott@master:/opt/pig-0.12.1$ mv pig.jar pig-0.12.1.jar
scott@master:/opt/pig-0.12.1$ mv pig-withouthadoop.jar pig-0.12.1-withouthadoop.jar

设置环境变量

在/etc/profile中添加

export PIG_HOME=/opt/pig-0.12.1
export PATH=$PIG_HOME/bin:$PATH

运行pig

pig -help

scott@master:/opt/pig-0.12.1$ pig -help

Apache Pig version 0.12.2-SNAPSHOT (r: unknown)
compiled 四月 17 2014, 11:10:12

USAGE: Pig [options] [-] : Run interactively in grunt shell.
Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
Pig [options] [-f[ile]] file : Run cmds found in file.
options include:
-4, -log4jconf - Log4j configuration file, overrides log conf
-b, -brief - Brief logging (no timestamps)
-c, -check - Syntax check
-d, -debug - Debug level, INFO is default
-e, -execute - Commands to execute (within quotes)
-f, -file - Path to the script to execute
-g, -embedded - ScriptEngine classname or keyword for the ScriptEngine
-h, -help - Display this message. You can specify topic to get help for that topic.
properties is the only topic currently supported: -h properties.
-i, -version - Display version information
-l, -logfile - Path to client side log file; default is current working directory.
-m, -param_file - Path to the parameter file
-p, -param - Key value pair of the form param=val
-r, -dryrun - Produces script with substituted parameters. Script is not executed.
-t, -optimizer_off - Turn optimizations off. The following values are supported:
SplitFilter - Split filter conditions
PushUpFilter - Filter as early as possible
MergeFilter - Merge filter conditions
PushDownForeachFlatten - Join or explode as late as possible
LimitOptimizer - Limit as early as possible
ColumnMapKeyPrune - Remove unused data
AddForEach - Add ForEach to remove unneeded columns
MergeForEach - Merge adjacent ForEach
GroupByConstParallelSetter - Force parallel 1 for "group all" statement
All - Disable all optimizations
All optimizations listed here are enabled by default. Optimization values are case insensitive.
-v, -verbose - Print all error messages to screen
-w, -warning - Turn warning logging on; also turns warning aggregation off
-x, -exectype - Set execution mode: local|mapreduce, default is mapreduce.
-F, -stop_on_failure - Aborts execution on the first failed job; default is off
-M, -no_multiquery - Turn multiquery optimization off; default is on
-P, -propertyFile - Path to property file
-printCmdDebug - Overrides anything else and prints the actual command used to run Pig, including
any environment variables that are set by the pig command.

本地模式运行pig -x local

scott@master:/opt/pig-0.12.1$ pig -x local
2014-04-17 11:47:26,879 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.2-SNAPSHOT (r: unknown) compiled 四月 17 2014, 11:10:12
2014-04-17 11:47:26,881 [main] INFO org.apache.pig.Main - Logging error messages to: /opt/pig-0.12.1/pig_1397706446877.log
2014-04-17 11:47:27,011 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/scott/.pigbootup not found
2014-04-17 11:47:27,534 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 11:47:27,536 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-04-17 11:47:27,545 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2014-04-17 11:47:27,577 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase-0.98.1/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2014-04-17 11:47:29,823 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 11:47:29,829 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
  • help
grunt> help
Commands:
<pig latin statement>; - See the PigLatin manual for details: http://hadoop.apache.org/pig
File system commands:
fs <fs arguments> - Equivalent to Hadoop dfs command: http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic commands:
describe <alias>[::<alias] - Show the schema for the alias. Inner aliases can be described as A::B.
explain [-script <pigscript>] [-out <path>] [-brief] [-dot|-xml] [-param <param_name>=<param_value>]
[-param_file <file_name>] [<alias>] - Show the execution plan to compute the alias or for entire script.
-script - Explain the entire script.
-out - Store the output into directory rather than print to stdout.
-brief - Don't expand nested plans (presenting a smaller graph for overview).
-dot - Generate the output in .dot format. Default is text format.
-xml - Generate the output in .xml format. Default is text format.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
alias - Alias to explain.
dump <alias> - Compute the alias and writes the results to stdout.
Utility Commands:
exec [-param <param_name>=param_value] [-param_file <file_name>] <script> -
Execute the script with access to grunt environment including aliases.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
script - Script to be executed.
run [-param <param_name>=param_value] [-param_file <file_name>] <script> -
Execute the script with access to grunt environment.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
script - Script to be executed.
sh <shell command> - Invoke a shell command.
kill <job_id> - Kill the hadoop job specified by the hadoop job id.
set <key> <value> - Provide execution parameters to Pig. Keys and values are case sensitive.
The following keys are supported:
default_parallel - Script-level reduce parallelism. Basic input size heuristics used by default.
debug - Set debug on or off. Default is off.
job.name - Single-quoted name for jobs. Default is PigLatin:<script name>
job.priority - Priority for jobs. Values: very_low, low, normal, high, very_high. Default is normal
stream.skippath - String that contains the path. This is used by streaming.
any hadoop property.
help - Display this message.
history [-n] - Display the list statements in cache.
-n Hide line numbers.
quit - Quit the grunt shell.

  • mapreduce模式运行pig -x mapreduce
pig -x mapreduce

运行pig自带的例子

Create the pigtutorial.tar.gz

在/opt/pig-0.12.1/tutorial目录下执行ant

scott@master:/opt/pig-0.12.1/tutorial$ ant
Buildfile: /opt/pig-0.12.1/tutorial/build.xml

init:
[mkdir] Created dir: /opt/pig-0.12.1/tutorial/build
[mkdir] Created dir: /opt/pig-0.12.1/tutorial/build/classes
[mkdir] Created dir: /opt/pig-0.12.1/tutorial/build/output
[mkdir] Created dir: /opt/pig-0.12.1/tutorial/build/output/pigtmp

compile:
[echo] *** Compiling Tutorial files ***
[javac] /opt/pig-0.12.1/tutorial/build.xml:69: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 7 source files to /opt/pig-0.12.1/tutorial/build/classes
[javac] 警告: [options] 未与 -source 1.5 一起设置引导类路径
[javac] 1 个警告

jar:
[echo] *** Creating tutorial.jar ***
[jar] Building jar: /opt/pig-0.12.1/tutorial/build/output/pigtmp/tutorial.jar

cp:
[echo] *** Preparing tar creation ***
[copy] Copying 7 files to /opt/pig-0.12.1/tutorial/build/output/pigtmp

tar:
[echo] *** Creating tutorial.jar ***
[tar] Building tar: /opt/pig-0.12.1/tutorial/build/pigtutorial.tar
[gzip] Building: /opt/pig-0.12.1/tutorial/pigtutorial.tar.gz

BUILD SUCCESSFUL
Total time: 6 seconds

该命令执行结束后会在/opt/pig-0.12.1/tutorial目录下生成一个pigtutorial.tar.gz

scott@master:/opt/pig-0.12.1/tutorial$ ll
总用量 28320
drwxr-xr-x 6 scott scott 4096 4月 17 13:13 ./
drwxr-xr-x 17 scott scott 4096 4月 17 13:10 ../
drwxrwxr-x 4 scott scott 4096 4月 17 13:13 build/
-rw-rw-r-- 1 scott scott 3422 4月 5 16:44 build.xml
drwxr-xr-x 2 scott scott 4096 4月 17 12:49 data/
-rw-rw-r-- 1 scott scott 28968660 4月 17 13:13 pigtutorial.tar.gz
drwxr-xr-x 2 scott scott 4096 4月 17 12:49 scripts/
drwxr-xr-x 3 scott scott 4096 4月 5 16:44 src/

拷贝/opt/pig-0.12.1/tutorial/pigtutorial.tar.gz到/opt/pig-0.12.1/目录下

scott@master:/opt/pig-0.12.1/tutorial$ cp pigtutorial.tar.gz ../

解压/opt/pig-0.12.1/pigtutorial.tar.gz文件到当前目录下

scott@master:/opt/pig-0.12.1$ tar zxvf pigtutorial.tar.gz 
pigtmp/
pigtmp/excite-small.log
pigtmp/excite.log.bz2
pigtmp/pig-0.12.1.jar
pigtmp/script1-hadoop.pig
pigtmp/script1-local.pig
pigtmp/script2-hadoop.pig
pigtmp/script2-local.pig
pigtmp/tutorial.jar

解压后会在pig安装的跟目录下生成一个pigtmp目录

在本地模式下运行pig自带例子

进入到pigtmp目录下。执行下面的操作
scott@master:/opt/pig-0.12.1$ cd pigtmp/
scott@master:/opt/pig-0.12.1/pigtmp$ ll
总用量 30352
drwxr-xr-x 2 scott scott 4096 4月 17 13:13 ./
drwxr-xr-x 18 scott scott 4096 4月 17 13:17 ../
-rw-r--r-- 1 scott scott 10408717 4月 17 13:13 excite.log.bz2
-rw-r--r-- 1 scott scott 208348 4月 17 13:13 excite-small.log
-rw-r--r-- 1 scott scott 20418810 4月 17 13:13 pig-0.12.1.jar
-rw-r--r-- 1 scott scott 3835 4月 17 13:13 script1-hadoop.pig
-rw-r--r-- 1 scott scott 3820 4月 17 13:13 script1-local.pig
-rw-r--r-- 1 scott scott 3489 4月 17 13:13 script2-hadoop.pig
-rw-r--r-- 1 scott scott 3480 4月 17 13:13 script2-local.pig
-rw-r--r-- 1 scott scott 10720 4月 17 13:13 tutorial.jar
pig -x local script1-local.pig*
scott@master:/opt/pig-0.12.1/pigtmp$ pig -x local script1-local.pig
2014-04-17 13:19:56,757 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.2-SNAPSHOT (r: unknown) compiled 四月 17 2014, 13:00:24
2014-04-17 13:19:56,758 [main] INFO org.apache.pig.Main - Logging error messages to: /opt/pig-0.12.1/pigtmp/pig_1397711996755.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase-0.98.1/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2014-04-17 13:19:59,425 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/scott/.pigbootup not found
2014-04-17 13:19:59,732 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 13:19:59,735 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-04-17 13:19:59,749 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2014-04-17 13:19:59,779 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-04-17 13:20:01,000 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.interval is deprecated. Instead, use mapreduce.job.end-notification.retry.interval
2014-04-17 13:20:01,001 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 13:20:01,001 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.retiredjobs.cache.size is deprecated. Instead, use mapreduce.jobtracker.retiredjobs.cache.size
2014-04-17 13:20:01,002 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.reduces is deprecated. Instead, use mapreduce.task.profile.reduces
2014-04-17 13:20:01,004 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reuse.jvm.num.tasks is deprecated. Instead, use mapreduce.job.jvm.numtasks
2014-04-17 13:20:01,005 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2014-04-17 13:20:01,005 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.http.address is deprecated. Instead, use mapreduce.tasktracker.http.address
2014-04-17 13:20:01,006 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.system.dir is deprecated. Instead, use mapreduce.jobtracker.system.dir
2014-04-17 13:20:01,006 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.report.address is deprecated. Instead, use mapreduce.tasktracker.report.address
2014-04-17 13:20:01,006 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.interval is deprecated. Instead, use mapreduce.tasktracker.healthchecker.interval
2014-04-17 13:20:01,007 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.child.tmp is deprecated. Instead, use mapreduce.task.tmp.dir
2014-04-17 13:20:01,007 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.taskmemorymanager.monitoring-interval is deprecated. Instead, use mapreduce.tasktracker.taskmemorymanager.monitoringinterval
2014-04-17 13:20:01,007 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.connect.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.connect.timeout
2014-04-17 13:20:01,008 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.speculativeCap is deprecated. Instead, use mapreduce.job.speculative.speculativecap
2014-04-17 13:20:01,008 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.spill.percent is deprecated. Instead, use mapreduce.map.sort.spill.percent
2014-04-17 13:20:01,009 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.shuffle.input.buffer.percent
2014-04-17 13:20:01,009 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.map.max.skip.records is deprecated. Instead, use mapreduce.map.skip.maxrecords
2014-04-17 13:20:01,009 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.maps is deprecated. Instead, use mapreduce.task.profile.maps
2014-04-17 13:20:01,010 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2014-04-17 13:20:01,010 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.merge.recordsBeforeProgress is deprecated. Instead, use mapreduce.task.merge.progress.records
2014-04-17 13:20:01,010 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2014-04-17 13:20:01,010 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
2014-04-17 13:20:01,010 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.retain.hours is deprecated. Instead, use mapreduce.job.userlog.retain.hours
2014-04-17 13:20:01,010 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowNodeThreshold is deprecated. Instead, use mapreduce.job.speculative.slownodethreshold
2014-04-17 13:20:01,013 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.reduce.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.reduce.tasks.maximum
2014-04-17 13:20:01,014 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.restart.recover is deprecated. Instead, use mapreduce.jobtracker.restart.recover
2014-04-17 13:20:01,015 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.child.log.level is deprecated. Instead, use mapreduce.reduce.log.level
2014-04-17 13:20:01,016 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.inmem.merge.threshold is deprecated. Instead, use mapreduce.reduce.merge.inmem.threshold
2014-04-17 13:20:01,016 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2014-04-17 13:20:01,017 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.acls.enabled is deprecated. Instead, use mapreduce.cluster.acls.enabled
2014-04-17 13:20:01,018 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.nameserver is deprecated. Instead, use mapreduce.tasktracker.dns.nameserver
2014-04-17 13:20:01,018 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
2014-04-17 13:20:01,021 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2014-04-17 13:20:01,021 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.child.log.level is deprecated. Instead, use mapreduce.map.log.level
2014-04-17 13:20:01,021 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
2014-04-17 13:20:01,021 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.merge.percent is deprecated. Instead, use mapreduce.reduce.shuffle.merge.percent
2014-04-17 13:20:01,021 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.jobhistory.lru.cache.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.lru.cache.size
2014-04-17 13:20:01,022 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.queue.name is deprecated. Instead, use mapreduce.job.queuename
2014-04-17 13:20:01,022 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.hours is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.hours
2014-04-17 13:20:01,022 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.script.timeout is deprecated. Instead, use mapreduce.tasktracker.healthchecker.script.timeout
2014-04-17 13:20:01,022 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.map.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.map.tasks.maximum
2014-04-17 13:20:01,022 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2014-04-17 13:20:01,022 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.completion.poll.interval is deprecated. Instead, use mapreduce.client.completion.pollinterval
2014-04-17 13:20:01,023 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.dir is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.dir
2014-04-17 13:20:01,023 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.slowstart.completed.maps is deprecated. Instead, use mapreduce.job.reduce.slowstart.completedmaps
2014-04-17 13:20:01,023 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
2014-04-17 13:20:01,024 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode
2014-04-17 13:20:01,029 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.instrumentation is deprecated. Instead, use mapreduce.jobtracker.instrumentation
2014-04-17 13:20:01,041 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl
2014-04-17 13:20:01,042 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
2014-04-17 13:20:01,043 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.attempts.to.start.skipping is deprecated. Instead, use mapreduce.task.skip.start.attempts
2014-04-17 13:20:01,044 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.task-controller is deprecated. Instead, use mapreduce.tasktracker.taskcontroller
2014-04-17 13:20:01,044 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.limit.kb is deprecated. Instead, use mapreduce.task.userlog.limit.kb
2014-04-17 13:20:01,045 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
2014-04-17 13:20:01,046 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2014-04-17 13:20:01,046 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacekill is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacekill
2014-04-17 13:20:01,047 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.jobtracker.split.metainfo.maxsize is deprecated. Instead, use mapreduce.job.split.metainfo.maxsize
2014-04-17 13:20:01,048 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.progress.monitor.poll.interval is deprecated. Instead, use mapreduce.client.progressmonitor.pollinterval
2014-04-17 13:20:01,049 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2014-04-17 13:20:01,049 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
2014-04-17 13:20:01,050 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile is deprecated. Instead, use mapreduce.task.profile
2014-04-17 13:20:01,050 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.parallel.copies is deprecated. Instead, use mapreduce.reduce.shuffle.parallelcopies
2014-04-17 13:20:01,051 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
2014-04-17 13:20:01,052 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.timeout is deprecated. Instead, use mapreduce.task.timeout
2014-04-17 13:20:01,057 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.heartbeats.in.second is deprecated. Instead, use mapreduce.jobtracker.heartbeats.in.second
2014-04-17 13:20:01,058 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.output.compression.codec is deprecated. Instead, use mapreduce.map.output.compress.codec
2014-04-17 13:20:01,058 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.cache.levels is deprecated. Instead, use mapreduce.jobtracker.taskcache.levels
2014-04-17 13:20:01,058 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.interface is deprecated. Instead, use mapreduce.tasktracker.dns.interface
2014-04-17 13:20:01,058 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.failures is deprecated. Instead, use mapreduce.job.maxtaskfailures.per.tracker
2014-04-17 13:20:01,058 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.df.interval is deprecated. Instead, use fs.df.interval
2014-04-17 13:20:01,058 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.tasks.sleeptime-before-sigkill is deprecated. Instead, use mapreduce.tasktracker.tasks.sleeptimebeforesigkill
2014-04-17 13:20:01,058 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.blacklists is deprecated. Instead, use mapreduce.jobtracker.tasktracker.maxblacklists
2014-04-17 13:20:01,058 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.output.filter is deprecated. Instead, use mapreduce.client.output.filter
2014-04-17 13:20:01,059 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.taskScheduler is deprecated. Instead, use mapreduce.jobtracker.taskscheduler
2014-04-17 13:20:01,059 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.attempts is deprecated. Instead, use mapreduce.job.end-notification.retry.attempts
2014-04-17 13:20:01,059 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowTaskThreshold is deprecated. Instead, use mapreduce.job.speculative.slowtaskthreshold
2014-04-17 13:20:01,059 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.indexcache.mb is deprecated. Instead, use mapreduce.tasktracker.indexcache.mb
2014-04-17 13:20:01,059 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - tasktracker.http.threads is deprecated. Instead, use mapreduce.tasktracker.http.threads
2014-04-17 13:20:01,059 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.handler.count is deprecated. Instead, use mapreduce.jobtracker.handler.count
2014-04-17 13:20:01,059 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - keep.failed.task.files is deprecated. Instead, use mapreduce.task.files.preserve.failedtasks
2014-04-17 13:20:01,059 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2014-04-17 13:20:01,059 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.job.history.block.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.block.size
2014-04-17 13:20:01,060 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.reduce.max.skip.groups is deprecated. Instead, use mapreduce.reduce.skip.maxgroups
2014-04-17 13:20:01,060 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - topology.script.number.args is deprecated. Instead, use net.topology.script.number.args
2014-04-17 13:20:01,060 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacestart is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacestart
2014-04-17 13:20:01,060 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.maxtasks.per.job is deprecated. Instead, use mapreduce.jobtracker.maxtasks.perjob
2014-04-17 13:20:01,060 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.max.attempts is deprecated. Instead, use mapreduce.reduce.maxattempts
2014-04-17 13:20:01,063 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.read.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.read.timeout
2014-04-17 13:20:01,063 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.instrumentation is deprecated. Instead, use mapreduce.tasktracker.instrumentation
2014-04-17 13:20:01,063 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.expiry.interval is deprecated. Instead, use mapreduce.jobtracker.expire.trackers.interval
2014-04-17 13:20:01,064 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.active is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.active
2014-04-17 13:20:01,065 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.temp.dir is deprecated. Instead, use mapreduce.cluster.temp.dir
2014-04-17 13:20:01,074 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-04-17 13:20:01,075 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 13:20:01,075 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.input.buffer.percent
2014-04-17 13:20:01,644 [main] WARN org.apache.pig.PigServer - Encountered Warning USING_OVERLOADED_FUNCTION 3 time(s).
2014-04-17 13:20:01,645 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 3 time(s).
2014-04-17 13:20:01,793 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,ORDER_BY,DISTINCT,FILTER
2014-04-17 13:20:01,933 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-04-17 13:20:02,031 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator
2014-04-17 13:20:02,299 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-04-17 13:20:02,398 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2014-04-17 13:20:02,473 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 5
2014-04-17 13:20:02,475 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 5
2014-04-17 13:20:02,602 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
2014-04-17 13:20:02,607 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2014-04-17 13:20:02,779 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-04-17 13:20:02,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-04-17 13:20:02,820 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2014-04-17 13:20:02,832 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2014-04-17 13:20:02,876 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=208348
2014-04-17 13:20:02,880 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2014-04-17 13:20:03,013 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-04-17 13:20:03,065 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2014-04-17 13:20:03,065 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2014-04-17 13:20:03,066 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Distributed cache not supported or needed in local mode. Setting key [pig.schematuple.local.dir] with code temp directory: /tmp/1397712003064-0
2014-04-17 13:20:03,069 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting identity combiner class.
2014-04-17 13:20:03,283 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-04-17 13:20:03,372 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2014-04-17 13:20:03,416 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
2014-04-17 13:20:03,418 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.key.comparator.class is deprecated. Instead, use mapreduce.job.output.key.comparator.class
2014-04-17 13:20:03,419 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
2014-04-17 13:20:03,420 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
2014-04-17 13:20:03,420 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.name is deprecated. Instead, use mapreduce.job.name
2014-04-17 13:20:03,421 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.value.groupfn.class is deprecated. Instead, use mapreduce.job.output.group.comparator.class
2014-04-17 13:20:03,421 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
2014-04-17 13:20:03,422 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
2014-04-17 13:20:03,426 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2014-04-17 13:20:03,426 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2014-04-17 13:20:03,427 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
2014-04-17 13:20:03,427 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
2014-04-17 13:20:04,839 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2014-04-17 13:20:04,984 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-04-17 13:20:04,985 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-04-17 13:20:05,085 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2014-04-17 13:20:05,199 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2014-04-17 13:20:05,267 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - user.name is deprecated. Instead, use mapreduce.job.user.name
2014-04-17 13:20:05,272 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 13:20:05,288 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 13:20:05,305 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 13:20:05,306 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
2014-04-17 13:20:06,199 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local1500759417_0001
2014-04-17 13:20:06,404 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-scott/mapred/staging/scott1500759417/.staging/job_local1500759417_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-04-17 13:20:06,448 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-scott/mapred/staging/scott1500759417/.staging/job_local1500759417_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-04-17 13:20:06,457 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-scott/mapred/staging/scott1500759417/.staging/job_local1500759417_0001/job.xml:an attempt to override final parameter: mapreduce.framework.name; Ignoring.
2014-04-17 13:20:07,092 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-scott/mapred/local/localRunner/scott/job_local1500759417_0001/job_local1500759417_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-04-17 13:20:07,135 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-scott/mapred/local/localRunner/scott/job_local1500759417_0001/job_local1500759417_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-04-17 13:20:07,150 [JobControl] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-scott/mapred/local/localRunner/scott/job_local1500759417_0001/job_local1500759417_0001.xml:an attempt to override final parameter: mapreduce.framework.name; Ignoring.
2014-04-17 13:20:07,207 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
2014-04-17 13:20:07,218 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1500759417_0001
2014-04-17 13:20:07,220 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases clean1,clean2,houred,ngramed1,raw
2014-04-17 13:20:07,223 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: raw[28,6],clean1[31,9],clean2[34,9],houred[39,9],ngramed1[42,11] C: R:
2014-04-17 13:20:07,242 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2014-04-17 13:20:07,252 [Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
2014-04-17 13:20:07,349 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.df.interval is deprecated. Instead, use fs.df.interval
2014-04-17 13:20:07,356 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.http.address is deprecated. Instead, use mapreduce.tasktracker.http.address
2014-04-17 13:20:07,358 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.retain.hours is deprecated. Instead, use mapreduce.job.userlog.retain.hours
2014-04-17 13:20:07,359 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-04-17 13:20:07,366 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacestart is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacestart
2014-04-17 13:20:07,368 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.read.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.read.timeout
2014-04-17 13:20:07,369 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.spill.percent is deprecated. Instead, use mapreduce.map.sort.spill.percent
2014-04-17 13:20:07,369 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.parallel.copies is deprecated. Instead, use mapreduce.reduce.shuffle.parallelcopies
2014-04-17 13:20:07,370 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
2014-04-17 13:20:07,370 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacekill is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacekill
2014-04-17 13:20:07,371 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile is deprecated. Instead, use mapreduce.task.profile
2014-04-17 13:20:07,371 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.heartbeats.in.second is deprecated. Instead, use mapreduce.jobtracker.heartbeats.in.second
2014-04-17 13:20:07,372 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2014-04-17 13:20:07,372 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.interval is deprecated. Instead, use mapreduce.tasktracker.healthchecker.interval
2014-04-17 13:20:07,372 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.timeout is deprecated. Instead, use mapreduce.task.timeout
2014-04-17 13:20:07,372 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.temp.dir is deprecated. Instead, use mapreduce.cluster.temp.dir
2014-04-17 13:20:07,373 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.completion.poll.interval is deprecated. Instead, use mapreduce.client.completion.pollinterval
2014-04-17 13:20:07,373 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.active is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.active
2014-04-17 13:20:07,373 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
2014-04-17 13:20:07,374 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.merge.percent is deprecated. Instead, use mapreduce.reduce.shuffle.merge.percent
2014-04-17 13:20:07,374 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
2014-04-17 13:20:07,374 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.input.buffer.percent
2014-04-17 13:20:07,375 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.cache.levels is deprecated. Instead, use mapreduce.jobtracker.taskcache.levels
2014-04-17 13:20:07,375 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
2014-04-17 13:20:07,375 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.instrumentation is deprecated. Instead, use mapreduce.jobtracker.instrumentation
2014-04-17 13:20:07,376 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.limit.kb is deprecated. Instead, use mapreduce.task.userlog.limit.kb
2014-04-17 13:20:07,380 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 13:20:07,381 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowNodeThreshold is deprecated. Instead, use mapreduce.job.speculative.slownodethreshold
2014-04-17 13:20:07,382 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.map.max.skip.records is deprecated. Instead, use mapreduce.map.skip.maxrecords
2014-04-17 13:20:07,384 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.jobhistory.lru.cache.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.lru.cache.size
2014-04-17 13:20:07,388 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.hours is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.hours
2014-04-17 13:20:07,389 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.handler.count is deprecated. Instead, use mapreduce.jobtracker.handler.count
2014-04-17 13:20:07,390 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2014-04-17 13:20:07,391 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
2014-04-17 13:20:07,391 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.maps is deprecated. Instead, use mapreduce.task.profile.maps
2014-04-17 13:20:07,392 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2014-04-17 13:20:07,393 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2014-04-17 13:20:07,393 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2014-04-17 13:20:07,395 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.nameserver is deprecated. Instead, use mapreduce.tasktracker.dns.nameserver
2014-04-17 13:20:07,404 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.taskmemorymanager.monitoring-interval is deprecated. Instead, use mapreduce.tasktracker.taskmemorymanager.monitoringinterval
2014-04-17 13:20:07,405 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.expiry.interval is deprecated. Instead, use mapreduce.jobtracker.expire.trackers.interval
2014-04-17 13:20:07,408 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.failures is deprecated. Instead, use mapreduce.job.maxtaskfailures.per.tracker
2014-04-17 13:20:07,409 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
2014-04-17 13:20:07,410 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.jobtracker.split.metainfo.maxsize is deprecated. Instead, use mapreduce.job.split.metainfo.maxsize
2014-04-17 13:20:07,410 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.dir is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.dir
2014-04-17 13:20:07,416 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.attempts is deprecated. Instead, use mapreduce.job.end-notification.retry.attempts
2014-04-17 13:20:07,416 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2014-04-17 13:20:07,416 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 13:20:07,416 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.key.comparator.class is deprecated. Instead, use mapreduce.job.output.key.comparator.class
2014-04-17 13:20:07,416 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.task-controller is deprecated. Instead, use mapreduce.tasktracker.taskcontroller
2014-04-17 13:20:07,416 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.maxtasks.per.job is deprecated. Instead, use mapreduce.jobtracker.maxtasks.perjob
2014-04-17 13:20:07,416 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.child.log.level is deprecated. Instead, use mapreduce.reduce.log.level
2014-04-17 13:20:07,417 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.max.attempts is deprecated. Instead, use mapreduce.reduce.maxattempts
2014-04-17 13:20:07,417 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.output.compression.codec is deprecated. Instead, use mapreduce.map.output.compress.codec
2014-04-17 13:20:07,417 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.shuffle.input.buffer.percent
2014-04-17 13:20:07,422 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.report.address is deprecated. Instead, use mapreduce.tasktracker.report.address
2014-04-17 13:20:07,424 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - keep.failed.task.files is deprecated. Instead, use mapreduce.task.files.preserve.failedtasks
2014-04-17 13:20:07,424 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
2014-04-17 13:20:07,424 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-04-17 13:20:07,424 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - tasktracker.http.threads is deprecated. Instead, use mapreduce.tasktracker.http.threads
2014-04-17 13:20:07,424 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowTaskThreshold is deprecated. Instead, use mapreduce.job.speculative.slowtaskthreshold
2014-04-17 13:20:07,424 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.acls.enabled is deprecated. Instead, use mapreduce.cluster.acls.enabled
2014-04-17 13:20:07,425 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.blacklists is deprecated. Instead, use mapreduce.jobtracker.tasktracker.maxblacklists
2014-04-17 13:20:07,425 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.indexcache.mb is deprecated. Instead, use mapreduce.tasktracker.indexcache.mb
2014-04-17 13:20:07,425 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.attempts.to.start.skipping is deprecated. Instead, use mapreduce.task.skip.start.attempts
2014-04-17 13:20:07,425 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
2014-04-17 13:20:07,425 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.reduce.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.reduce.tasks.maximum
2014-04-17 13:20:07,426 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.name is deprecated. Instead, use mapreduce.job.name
2014-04-17 13:20:07,426 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.output.filter is deprecated. Instead, use mapreduce.client.output.filter
2014-04-17 13:20:07,426 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.restart.recover is deprecated. Instead, use mapreduce.jobtracker.restart.recover
2014-04-17 13:20:07,426 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2014-04-17 13:20:07,426 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-04-17 13:20:07,426 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.speculativeCap is deprecated. Instead, use mapreduce.job.speculative.speculativecap
2014-04-17 13:20:07,426 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.value.groupfn.class is deprecated. Instead, use mapreduce.job.output.group.comparator.class
2014-04-17 13:20:07,427 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.progress.monitor.poll.interval is deprecated. Instead, use mapreduce.client.progressmonitor.pollinterval
2014-04-17 13:20:07,433 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
2014-04-17 13:20:07,434 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.child.log.level is deprecated. Instead, use mapreduce.map.log.level
2014-04-17 13:20:07,436 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
2014-04-17 13:20:07,440 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
2014-04-17 13:20:07,440 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.retiredjobs.cache.size is deprecated. Instead, use mapreduce.jobtracker.retiredjobs.cache.size
2014-04-17 13:20:07,440 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.interface is deprecated. Instead, use mapreduce.tasktracker.dns.interface
2014-04-17 13:20:07,441 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.reduces is deprecated. Instead, use mapreduce.task.profile.reduces
2014-04-17 13:20:07,441 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2014-04-17 13:20:07,442 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.interval is deprecated. Instead, use mapreduce.job.end-notification.retry.interval
2014-04-17 13:20:07,442 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2014-04-17 13:20:07,442 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.job.history.block.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.block.size
2014-04-17 13:20:07,443 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
2014-04-17 13:20:07,443 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.child.tmp is deprecated. Instead, use mapreduce.task.tmp.dir
2014-04-17 13:20:07,443 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2014-04-17 13:20:07,444 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
2014-04-17 13:20:07,449 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.map.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.map.tasks.maximum
2014-04-17 13:20:07,449 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.queue.name is deprecated. Instead, use mapreduce.job.queuename
2014-04-17 13:20:07,449 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.taskScheduler is deprecated. Instead, use mapreduce.jobtracker.taskscheduler
2014-04-17 13:20:07,449 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.reduce.max.skip.groups is deprecated. Instead, use mapreduce.reduce.skip.maxgroups
2014-04-17 13:20:07,449 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2014-04-17 13:20:07,449 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.script.timeout is deprecated. Instead, use mapreduce.tasktracker.healthchecker.script.timeout
2014-04-17 13:20:07,449 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.instrumentation is deprecated. Instead, use mapreduce.tasktracker.instrumentation
2014-04-17 13:20:07,450 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
2014-04-17 13:20:07,450 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.system.dir is deprecated. Instead, use mapreduce.jobtracker.system.dir
2014-04-17 13:20:07,452 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reuse.jvm.num.tasks is deprecated. Instead, use mapreduce.job.jvm.numtasks
2014-04-17 13:20:07,453 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.inmem.merge.threshold is deprecated. Instead, use mapreduce.reduce.merge.inmem.threshold
2014-04-17 13:20:07,454 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - topology.script.number.args is deprecated. Instead, use net.topology.script.number.args
2014-04-17 13:20:07,458 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.slowstart.completed.maps is deprecated. Instead, use mapreduce.job.reduce.slowstart.completedmaps
2014-04-17 13:20:07,458 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode
2014-04-17 13:20:07,460 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl
2014-04-17 13:20:07,461 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.tasks.sleeptime-before-sigkill is deprecated. Instead, use mapreduce.tasktracker.tasks.sleeptimebeforesigkill
2014-04-17 13:20:07,472 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
2014-04-17 13:20:07,473 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.merge.recordsBeforeProgress is deprecated. Instead, use mapreduce.task.merge.progress.records
2014-04-17 13:20:07,474 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.connect.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.connect.timeout
2014-04-17 13:20:07,475 [Thread-4] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 13:20:07,502 [Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2014-04-17 13:20:07,876 [Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2014-04-17 13:20:07,878 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1500759417_0001_m_000000_0
2014-04-17 13:20:08,086 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2014-04-17 13:20:08,087 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - user.name is deprecated. Instead, use mapreduce.job.user.name
2014-04-17 13:20:08,088 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
2014-04-17 13:20:08,089 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tip.id is deprecated. Instead, use mapreduce.task.id
2014-04-17 13:20:08,090 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2014-04-17 13:20:08,092 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.id is deprecated. Instead, use mapreduce.job.id
2014-04-17 13:20:08,259 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2014-04-17 13:20:08,289 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 208348
Input split[0]:
Length = 208348
Locations:

-----------------------

2014-04-17 13:20:08,376 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2014-04-17 13:20:08,392 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/opt/pig-0.12.1/pigtmp/excite-small.log:0+208348
2014-04-17 13:20:08,455 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2014-04-17 13:20:09,388 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - (EQUATOR) 0 kvi 26214396(104857584)
2014-04-17 13:20:09,389 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - mapreduce.task.io.sort.mb: 100
2014-04-17 13:20:09,390 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - soft limit at 83886080
2014-04-17 13:20:09,391 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufvoid = 104857600
2014-04-17 13:20:09,393 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396; length = 6553600
2014-04-17 13:20:09,450 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.max.objects is deprecated. Instead, use dfs.namenode.max.objects
2014-04-17 13:20:09,452 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.data.dir is deprecated. Instead, use dfs.datanode.data.dir
2014-04-17 13:20:09,453 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
2014-04-17 13:20:09,453 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2014-04-17 13:20:09,454 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.name.edits.dir is deprecated. Instead, use dfs.namenode.edits.dir
2014-04-17 13:20:09,454 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.considerLoad is deprecated. Instead, use dfs.namenode.replication.considerLoad
2014-04-17 13:20:09,454 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 13:20:09,455 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.http.address is deprecated. Instead, use dfs.namenode.http-address
2014-04-17 13:20:09,455 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.name.dir.restore is deprecated. Instead, use dfs.namenode.name.dir.restore
2014-04-17 13:20:09,455 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.backup.address is deprecated. Instead, use dfs.namenode.backup.address
2014-04-17 13:20:09,456 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.permissions is deprecated. Instead, use dfs.permissions.enabled
2014-04-17 13:20:09,456 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.safemode.extension is deprecated. Instead, use dfs.namenode.safemode.extension
2014-04-17 13:20:09,457 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.datanode.max.xcievers is deprecated. Instead, use dfs.datanode.max.transfer.threads
2014-04-17 13:20:09,459 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2014-04-17 13:20:09,459 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.permissions.supergroup is deprecated. Instead, use dfs.permissions.superusergroup
2014-04-17 13:20:09,460 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.secondary.http.address is deprecated. Instead, use dfs.namenode.secondary.http-address
2014-04-17 13:20:09,461 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.period is deprecated. Instead, use dfs.namenode.checkpoint.period
2014-04-17 13:20:09,461 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 13:20:09,462 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
2014-04-17 13:20:09,462 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.name.dir is deprecated. Instead, use dfs.namenode.name.dir
2014-04-17 13:20:09,463 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 13:20:09,466 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.dir is deprecated. Instead, use dfs.namenode.checkpoint.dir
2014-04-17 13:20:09,466 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.block.size is deprecated. Instead, use dfs.blocksize
2014-04-17 13:20:09,466 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.access.time.precision is deprecated. Instead, use dfs.namenode.accesstime.precision
2014-04-17 13:20:09,468 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.balance.bandwidthPerSec is deprecated. Instead, use dfs.datanode.balance.bandwidthPerSec
2014-04-17 13:20:09,468 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.safemode.threshold.pct is deprecated. Instead, use dfs.namenode.safemode.threshold-pct
2014-04-17 13:20:09,468 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.https.client.keystore.resource is deprecated. Instead, use dfs.client.https.keystore.resource
2014-04-17 13:20:09,469 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.backup.http.address is deprecated. Instead, use dfs.namenode.backup.http-address
2014-04-17 13:20:09,469 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.https.need.client.auth is deprecated. Instead, use dfs.client.https.need-auth
2014-04-17 13:20:09,470 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.interval is deprecated. Instead, use dfs.namenode.replication.interval
2014-04-17 13:20:09,471 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.edits.dir is deprecated. Instead, use dfs.namenode.checkpoint.edits.dir
2014-04-17 13:20:09,471 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.write.packet.size is deprecated. Instead, use dfs.client-write-packet-size
2014-04-17 13:20:09,532 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2014-04-17 13:20:09,584 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: raw[28,6],clean1[31,9],clean2[34,9],houred[39,9],ngramed1[42,11] C: R:
2014-04-17 13:20:11,820 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -
2014-04-17 13:20:11,821 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2014-04-17 13:20:11,822 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Spilling map output
2014-04-17 13:20:11,822 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufend = 632684; bufvoid = 104857600
2014-04-17 13:20:11,823 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396(104857584); kvend = 26152000(104608000); length = 62397/6553600
2014-04-17 13:20:13,028 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
2014-04-17 13:20:13,033 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task:attempt_local1500759417_0001_m_000000_0 is done. And is in the process of committing
2014-04-17 13:20:13,117 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map
2014-04-17 13:20:13,118 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1500759417_0001_m_000000_0' done.
2014-04-17 13:20:13,120 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local1500759417_0001_m_000000_0
2014-04-17 13:20:13,121 [Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2014-04-17 13:20:13,200 [Thread-4] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2014-04-17 13:20:13,225 [Thread-4] INFO org.apache.hadoop.mapred.ReduceTask - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@8358eb5
2014-04-17 13:20:13,287 [Thread-4] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2014-04-17 13:20:13,311 [EventFetcher for fetching Map Completion Events] INFO org.apache.hadoop.mapreduce.task.reduce.EventFetcher - attempt_local1500759417_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2014-04-17 13:20:19,197 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy
2014-04-17 13:20:22,198 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > copy

不知道为什么这里本地模式一直处于org.apache.hadoop.mapred.LocalJobRunner - reduce > copy阶段。知道的麻烦告诉我一声.

以mapreduce方式运行pig自带例子

进入到pigtmp目录下,执行下面的操作
scott@master:/opt/pig-0.12.1$ cd pigtmp/
scott@master:/opt/pig-0.12.1/pigtmp$ ll
总用量 30352
drwxr-xr-x 2 scott scott 4096 4月 17 13:13 ./
drwxr-xr-x 18 scott scott 4096 4月 17 13:17 ../
-rw-r--r-- 1 scott scott 10408717 4月 17 13:13 excite.log.bz2
-rw-r--r-- 1 scott scott 208348 4月 17 13:13 excite-small.log
-rw-r--r-- 1 scott scott 20418810 4月 17 13:13 pig-0.12.1.jar
-rw-r--r-- 1 scott scott 3835 4月 17 13:13 script1-hadoop.pig
-rw-r--r-- 1 scott scott 3820 4月 17 13:13 script1-local.pig
-rw-r--r-- 1 scott scott 3489 4月 17 13:13 script2-hadoop.pig
-rw-r--r-- 1 scott scott 3480 4月 17 13:13 script2-local.pig
-rw-r--r-- 1 scott scott 10720 4月 17 13:13 tutorial.jar
启动hadoop集群
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver

启动namenode、datanode、jobhistory等守护进程

拷贝excite.log.bz2文件到hdfs文件系统上
scott@master:/opt/pig-0.12.1/pigtmp$ hdfs dfs -copyFromLocal excite.log.bz2 .

执行完上述操作将把excite.log.bz2拷贝到hdfs文件系统上(/user/scott/excite.log.bz2)

列出hdfs文件系统上的文件

scott@master:/opt/pig-0.12.1/pigtmp$ hdfs dfs -ls .
Found 4 items
drwxr-xr-x - scott supergroup 0 2014-04-14 17:18 examples
-rw-r--r-- 1 scott supergroup 10408717 2014-04-17 13:52 excite.log.bz2
drwxr-xr-x - scott supergroup 0 2014-04-14 18:01 oozie-scot
drwxr-xr-x - scott supergroup 0 2014-04-14 16:55 share
设置PIG_CLASSPATH环境变量

将下面的内容添加到/etc/profile文件中

export PIG_CLASSPATH=/opt/hadoop-2.2.0/etc/hadoop

注:

PIG_CLASSPATH环境变量的值其实为hadoop配置文件所在目录(该目录应包含core-site.xml, hdfs-site.xml, mapred-site.xml)

设置HADOOP_CONF_DIR环境变量

将下面的内容添加到/etc/profile文件中

export HADOOP_CONF_DIR=/opt/hadoop-2.2.0/etc/hadoop

注:

HADOOP_CONF_DIR的值为hadoop集群配置文件所在目录(/opt/hadoop-2.2.0/etc/hadoop)

source /etc/profile

让环境变量立即生效

以mapreduce方式执行pig例子程序
scott@master:/opt/pig-0.12.1/pigtmp$ pig -x mapreduce script1-hadoop.pig
  • 执行后屏幕输出为
scott@master:/opt/pig-0.12.1/pigtmp$ pig script1-hadoop.pig
2014-04-17 14:35:11,010 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.2-SNAPSHOT (r: unknown) compiled 四月 17 2014, 13:00:24
2014-04-17 14:35:11,011 [main] INFO org.apache.pig.Main - Logging error messages to: /opt/pig-0.12.1/pigtmp/pig_1397716511008.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase-0.98.1/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2014-04-17 14:35:12,851 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/scott/.pigbootup not found
2014-04-17 14:35:13,404 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-04-17 14:35:13,410 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 14:35:13,411 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:9000
2014-04-17 14:35:13,435 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-04-17 14:35:15,931 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.maxtasks.per.job is deprecated. Instead, use mapreduce.jobtracker.maxtasks.perjob
2014-04-17 14:35:15,932 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.system.dir is deprecated. Instead, use mapreduce.jobtracker.system.dir
2014-04-17 14:35:15,933 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.max.attempts is deprecated. Instead, use mapreduce.reduce.maxattempts
2014-04-17 14:35:15,934 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.map.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.map.tasks.maximum
2014-04-17 14:35:15,935 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacekill is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacekill
2014-04-17 14:35:15,936 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.job.history.block.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.block.size
2014-04-17 14:35:15,936 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.backup.address is deprecated. Instead, use dfs.namenode.backup.address
2014-04-17 14:35:15,937 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.name.edits.dir is deprecated. Instead, use dfs.namenode.edits.dir
2014-04-17 14:35:15,938 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.timeout is deprecated. Instead, use mapreduce.task.timeout
2014-04-17 14:35:15,938 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.task-controller is deprecated. Instead, use mapreduce.tasktracker.taskcontroller
2014-04-17 14:35:15,939 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir.minspacestart is deprecated. Instead, use mapreduce.tasktracker.local.dir.minspacestart
2014-04-17 14:35:15,940 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.progress.monitor.poll.interval is deprecated. Instead, use mapreduce.client.progressmonitor.pollinterval
2014-04-17 14:35:15,940 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.merge.percent is deprecated. Instead, use mapreduce.reduce.shuffle.merge.percent
2014-04-17 14:35:15,941 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.block.size is deprecated. Instead, use dfs.blocksize
2014-04-17 14:35:15,941 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.df.interval is deprecated. Instead, use fs.df.interval
2014-04-17 14:35:15,942 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.reduce.max.skip.groups is deprecated. Instead, use mapreduce.reduce.skip.maxgroups
2014-04-17 14:35:15,942 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
2014-04-17 14:35:15,942 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile is deprecated. Instead, use mapreduce.task.profile
2014-04-17 14:35:15,942 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 14:35:15,943 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.child.tmp is deprecated. Instead, use mapreduce.task.tmp.dir
2014-04-17 14:35:15,943 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
2014-04-17 14:35:15,951 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.safemode.threshold.pct is deprecated. Instead, use dfs.namenode.safemode.threshold-pct
2014-04-17 14:35:15,951 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.https.client.keystore.resource is deprecated. Instead, use dfs.client.https.keystore.resource
2014-04-17 14:35:15,951 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.reduce.tasks.maximum is deprecated. Instead, use mapreduce.tasktracker.reduce.tasks.maximum
2014-04-17 14:35:15,952 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.limit.kb is deprecated. Instead, use mapreduce.task.userlog.limit.kb
2014-04-17 14:35:15,952 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2014-04-17 14:35:15,952 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.parallel.copies is deprecated. Instead, use mapreduce.reduce.shuffle.parallelcopies
2014-04-17 14:35:15,952 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-04-17 14:35:15,952 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.attempts.to.start.skipping is deprecated. Instead, use mapreduce.task.skip.start.attempts
2014-04-17 14:35:15,952 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2014-04-17 14:35:15,952 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.completion.poll.interval is deprecated. Instead, use mapreduce.client.completion.pollinterval
2014-04-17 14:35:15,952 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - jobclient.output.filter is deprecated. Instead, use mapreduce.client.output.filter
2014-04-17 14:35:15,953 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
2014-04-17 14:35:15,953 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.max.objects is deprecated. Instead, use dfs.namenode.max.objects
2014-04-17 14:35:15,953 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2014-04-17 14:35:15,953 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2014-04-17 14:35:15,953 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.expiry.interval is deprecated. Instead, use mapreduce.jobtracker.expire.trackers.interval
2014-04-17 14:35:15,953 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.datanode.max.xcievers is deprecated. Instead, use dfs.datanode.max.transfer.threads
2014-04-17 14:35:15,953 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
2014-04-17 14:35:15,953 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.taskScheduler is deprecated. Instead, use mapreduce.jobtracker.taskscheduler
2014-04-17 14:35:15,954 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.temp.dir is deprecated. Instead, use mapreduce.cluster.temp.dir
2014-04-17 14:35:15,954 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.taskmemorymanager.monitoring-interval is deprecated. Instead, use mapreduce.tasktracker.taskmemorymanager.monitoringinterval
2014-04-17 14:35:15,954 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2014-04-17 14:35:15,954 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
2014-04-17 14:35:15,954 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.userlog.retain.hours is deprecated. Instead, use mapreduce.job.userlog.retain.hours
2014-04-17 14:35:15,954 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.interval is deprecated. Instead, use mapreduce.tasktracker.healthchecker.interval
2014-04-17 14:35:15,955 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.retiredjobs.cache.size is deprecated. Instead, use mapreduce.jobtracker.retiredjobs.cache.size
2014-04-17 14:35:15,955 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.failures is deprecated. Instead, use mapreduce.job.maxtaskfailures.per.tracker
2014-04-17 14:35:15,961 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.considerLoad is deprecated. Instead, use dfs.namenode.replication.considerLoad
2014-04-17 14:35:15,961 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2014-04-17 14:35:15,965 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.acls.enabled is deprecated. Instead, use mapreduce.cluster.acls.enabled
2014-04-17 14:35:15,965 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.slowstart.completed.maps is deprecated. Instead, use mapreduce.job.reduce.slowstart.completedmaps
2014-04-17 14:35:15,965 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.handler.count is deprecated. Instead, use mapreduce.jobtracker.handler.count
2014-04-17 14:35:15,966 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.attempts is deprecated. Instead, use mapreduce.job.end-notification.retry.attempts
2014-04-17 14:35:15,966 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.http.address is deprecated. Instead, use dfs.namenode.http-address
2014-04-17 14:35:15,967 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.dir is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.dir
2014-04-17 14:35:15,967 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.name.dir.restore is deprecated. Instead, use dfs.namenode.name.dir.restore
2014-04-17 14:35:15,968 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2014-04-17 14:35:15,968 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.healthChecker.script.timeout is deprecated. Instead, use mapreduce.tasktracker.healthchecker.script.timeout
2014-04-17 14:35:15,969 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.connect.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.connect.timeout
2014-04-17 14:35:15,969 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.backup.http.address is deprecated. Instead, use dfs.namenode.backup.http-address
2014-04-17 14:35:15,970 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.secondary.http.address is deprecated. Instead, use dfs.namenode.secondary.http-address
2014-04-17 14:35:15,970 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode
2014-04-17 14:35:15,971 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
2014-04-17 14:35:15,971 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.replication.interval is deprecated. Instead, use dfs.namenode.replication.interval
2014-04-17 14:35:15,972 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.name.dir is deprecated. Instead, use dfs.namenode.name.dir
2014-04-17 14:35:15,975 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.indexcache.mb is deprecated. Instead, use mapreduce.tasktracker.indexcache.mb
2014-04-17 14:35:15,976 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - keep.failed.task.files is deprecated. Instead, use mapreduce.task.files.preserve.failedtasks
2014-04-17 14:35:15,976 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.heartbeats.in.second is deprecated. Instead, use mapreduce.jobtracker.heartbeats.in.second
2014-04-17 14:35:15,976 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.permissions is deprecated. Instead, use dfs.permissions.enabled
2014-04-17 14:35:15,977 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowTaskThreshold is deprecated. Instead, use mapreduce.job.speculative.slowtaskthreshold
2014-04-17 14:35:15,978 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.dir is deprecated. Instead, use dfs.namenode.checkpoint.dir
2014-04-17 14:35:15,978 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.interface is deprecated. Instead, use mapreduce.tasktracker.dns.interface
2014-04-17 14:35:15,979 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.slowNodeThreshold is deprecated. Instead, use mapreduce.job.speculative.slownodethreshold
2014-04-17 14:35:15,980 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
2014-04-17 14:35:15,980 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.https.need.client.auth is deprecated. Instead, use dfs.client.https.need-auth
2014-04-17 14:35:15,988 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.edits.dir is deprecated. Instead, use dfs.namenode.checkpoint.edits.dir
2014-04-17 14:35:15,988 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.hours is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.hours
2014-04-17 14:35:15,991 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reuse.jvm.num.tasks is deprecated. Instead, use mapreduce.job.jvm.numtasks
2014-04-17 14:35:15,992 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl
2014-04-17 14:35:15,992 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.cache.levels is deprecated. Instead, use mapreduce.jobtracker.taskcache.levels
2014-04-17 14:35:15,993 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.instrumentation is deprecated. Instead, use mapreduce.tasktracker.instrumentation
2014-04-17 14:35:15,993 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.access.time.precision is deprecated. Instead, use dfs.namenode.accesstime.precision
2014-04-17 14:35:15,993 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.queue.name is deprecated. Instead, use mapreduce.job.queuename
2014-04-17 14:35:15,994 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.child.log.level is deprecated. Instead, use mapreduce.reduce.log.level
2014-04-17 14:35:15,994 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.balance.bandwidthPerSec is deprecated. Instead, use dfs.datanode.balance.bandwidthPerSec
2014-04-17 14:35:15,995 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.persist.jobstatus.active is deprecated. Instead, use mapreduce.jobtracker.persist.jobstatus.active
2014-04-17 14:35:15,995 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.output.compression.codec is deprecated. Instead, use mapreduce.map.output.compress.codec
2014-04-17 14:35:15,995 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.http.address is deprecated. Instead, use mapreduce.tasktracker.http.address
2014-04-17 14:35:15,996 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.jobtracker.split.metainfo.maxsize is deprecated. Instead, use mapreduce.job.split.metainfo.maxsize
2014-04-17 14:35:15,996 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.reduces is deprecated. Instead, use mapreduce.task.profile.reduces
2014-04-17 14:35:15,997 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.inmem.merge.threshold is deprecated. Instead, use mapreduce.reduce.merge.inmem.threshold
2014-04-17 14:35:15,997 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.safemode.extension is deprecated. Instead, use dfs.namenode.safemode.extension
2014-04-17 14:35:15,997 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.checkpoint.period is deprecated. Instead, use dfs.namenode.checkpoint.period
2014-04-17 14:35:15,998 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 14:35:15,998 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.data.dir is deprecated. Instead, use dfs.datanode.data.dir
2014-04-17 14:35:15,999 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2014-04-17 14:35:15,999 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - job.end.retry.interval is deprecated. Instead, use mapreduce.job.end-notification.retry.interval
2014-04-17 14:35:16,000 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
2014-04-17 14:35:16,000 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.sort.spill.percent is deprecated. Instead, use mapreduce.map.sort.spill.percent
2014-04-17 14:35:16,001 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.permissions.supergroup is deprecated. Instead, use dfs.permissions.superusergroup
2014-04-17 14:35:16,001 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2014-04-17 14:35:16,001 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - tasktracker.http.threads is deprecated. Instead, use mapreduce.tasktracker.http.threads
2014-04-17 14:35:16,002 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
2014-04-17 14:35:16,002 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.input.buffer.percent
2014-04-17 14:35:16,002 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.tasks.sleeptime-before-sigkill is deprecated. Instead, use mapreduce.tasktracker.tasks.sleeptimebeforesigkill
2014-04-17 14:35:16,003 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.tasktracker.dns.nameserver is deprecated. Instead, use mapreduce.tasktracker.dns.nameserver
2014-04-17 14:35:16,003 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.shuffle.read.timeout is deprecated. Instead, use mapreduce.reduce.shuffle.read.timeout
2014-04-17 14:35:16,004 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.tracker.blacklists is deprecated. Instead, use mapreduce.jobtracker.tasktracker.maxblacklists
2014-04-17 14:35:16,004 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - topology.script.number.args is deprecated. Instead, use net.topology.script.number.args
2014-04-17 14:35:16,004 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.shuffle.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.shuffle.input.buffer.percent
2014-04-17 14:35:16,005 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.merge.recordsBeforeProgress is deprecated. Instead, use mapreduce.task.merge.progress.records
2014-04-17 14:35:16,005 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - dfs.write.packet.size is deprecated. Instead, use dfs.client-write-packet-size
2014-04-17 14:35:16,005 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.restart.recover is deprecated. Instead, use mapreduce.jobtracker.restart.recover
2014-04-17 14:35:16,006 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2014-04-17 14:35:16,006 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.jobhistory.lru.cache.size is deprecated. Instead, use mapreduce.jobtracker.jobhistory.lru.cache.size
2014-04-17 14:35:16,008 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.child.log.level is deprecated. Instead, use mapreduce.map.log.level
2014-04-17 14:35:16,009 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.tracker.report.address is deprecated. Instead, use mapreduce.tasktracker.report.address
2014-04-17 14:35:16,009 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.speculative.execution.speculativeCap is deprecated. Instead, use mapreduce.job.speculative.speculativecap
2014-04-17 14:35:16,010 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.map.max.skip.records is deprecated. Instead, use mapreduce.map.skip.maxrecords
2014-04-17 14:35:16,010 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.profile.maps is deprecated. Instead, use mapreduce.task.profile.maps
2014-04-17 14:35:16,010 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jobtracker.instrumentation is deprecated. Instead, use mapreduce.jobtracker.instrumentation
2014-04-17 14:35:16,595 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 3 time(s).
2014-04-17 14:35:16,598 [main] WARN org.apache.pig.PigServer - Encountered Warning USING_OVERLOADED_FUNCTION 3 time(s).
2014-04-17 14:35:16,749 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,ORDER_BY,DISTINCT,FILTER
2014-04-17 14:35:16,859 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-04-17 14:35:16,953 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator
2014-04-17 14:35:17,265 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-04-17 14:35:17,469 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2014-04-17 14:35:17,599 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 5
2014-04-17 14:35:17,605 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 5
2014-04-17 14:35:18,112 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:35:18,356 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-04-17 14:35:18,386 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-04-17 14:35:18,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2014-04-17 14:35:18,396 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2014-04-17 14:35:18,461 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=10408717
2014-04-17 14:35:18,463 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2014-04-17 14:35:19,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job3428626336995985635.jar
2014-04-17 14:35:24,521 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job3428626336995985635.jar created
2014-04-17 14:35:24,522 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2014-04-17 14:35:24,716 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-04-17 14:35:24,764 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2014-04-17 14:35:24,783 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2014-04-17 14:35:24,784 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2014-04-17 14:35:24,787 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting identity combiner class.
2014-04-17 14:35:25,156 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-04-17 14:35:25,224 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:35:25,264 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files
2014-04-17 14:35:25,266 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 14:35:25,277 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
2014-04-17 14:35:25,278 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
2014-04-17 14:35:25,279 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 14:35:25,280 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.key.comparator.class is deprecated. Instead, use mapreduce.job.output.key.comparator.class
2014-04-17 14:35:25,282 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
2014-04-17 14:35:25,283 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
2014-04-17 14:35:25,294 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.name is deprecated. Instead, use mapreduce.job.name
2014-04-17 14:35:25,295 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.value.groupfn.class is deprecated. Instead, use mapreduce.job.output.group.comparator.class
2014-04-17 14:35:25,296 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
2014-04-17 14:35:25,297 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
2014-04-17 14:35:25,297 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2014-04-17 14:35:25,298 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2014-04-17 14:35:25,298 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
2014-04-17 14:35:25,299 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
2014-04-17 14:35:25,301 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 14:35:27,374 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-04-17 14:35:27,375 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-04-17 14:35:27,413 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2014-04-17 14:35:27,633 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2014-04-17 14:35:27,665 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
2014-04-17 14:35:27,667 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
2014-04-17 14:35:27,669 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - user.name is deprecated. Instead, use mapreduce.job.user.name
2014-04-17 14:35:27,670 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
2014-04-17 14:35:28,633 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1397713701605_0002
2014-04-17 14:35:31,577 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1397713701605_0002 to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:35:31,713 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master:8088/proxy/application_1397713701605_0002/
2014-04-17 14:35:31,715 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1397713701605_0002
2014-04-17 14:35:31,715 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases clean1,clean2,houred,ngramed1,raw
2014-04-17 14:35:31,716 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: raw[29,6],clean1[33,9],clean2[36,9],houred[41,9],ngramed1[44,11] C: R:
2014-04-17 14:35:31,880 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2014-04-17 14:41:48,801 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 6% complete
2014-04-17 14:42:55,327 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 16% complete
2014-04-17 14:43:17,250 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-04-17 14:43:17,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-04-17 14:43:17,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2014-04-17 14:43:17,255 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2014-04-17 14:43:17,306 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=56234670
2014-04-17 14:43:17,309 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2014-04-17 14:43:17,495 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job1614511730973324322.jar
2014-04-17 14:43:21,947 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job1614511730973324322.jar created
2014-04-17 14:43:21,996 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-04-17 14:43:21,998 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2014-04-17 14:43:21,999 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2014-04-17 14:43:22,000 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2014-04-17 14:43:22,072 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-04-17 14:43:22,119 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:43:22,135 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 14:43:22,143 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 14:43:22,147 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 14:43:24,973 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-04-17 14:43:24,974 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-04-17 14:43:24,975 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2014-04-17 14:43:25,275 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2014-04-17 14:43:25,463 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1397713701605_0003
2014-04-17 14:43:25,532 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1397713701605_0003 to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:43:25,571 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master:8088/proxy/application_1397713701605_0003/
2014-04-17 14:43:25,572 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1397713701605_0003
2014-04-17 14:43:25,572 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases hour_frequency1,hour_frequency2
2014-04-17 14:43:25,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: hour_frequency2[53,18],hour_frequency1[50,18] C: hour_frequency2[53,18],hour_frequency1[50,18] R: hour_frequency2[53,18]
2014-04-17 14:44:19,649 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 21% complete
2014-04-17 14:44:26,157 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 26% complete
2014-04-17 14:45:09,141 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 36% complete
2014-04-17 14:45:19,572 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-04-17 14:45:19,574 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-04-17 14:45:19,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2014-04-17 14:45:19,579 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2014-04-17 14:45:19,615 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=19824881
2014-04-17 14:45:19,616 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2014-04-17 14:45:19,732 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job6028257307805976030.jar
2014-04-17 14:45:25,079 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job6028257307805976030.jar created
2014-04-17 14:45:25,180 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-04-17 14:45:25,182 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2014-04-17 14:45:25,186 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2014-04-17 14:45:25,187 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2014-04-17 14:45:25,287 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-04-17 14:45:25,310 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:45:25,342 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 14:45:25,349 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 14:45:25,359 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 14:45:26,640 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-04-17 14:45:26,641 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-04-17 14:45:26,642 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2014-04-17 14:45:26,774 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2014-04-17 14:45:27,031 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1397713701605_0004
2014-04-17 14:45:27,210 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1397713701605_0004 to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:45:27,218 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master:8088/proxy/application_1397713701605_0004/
2014-04-17 14:45:27,228 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1397713701605_0004
2014-04-17 14:45:27,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases filtered_uniq_frequency,uniq_frequency1,uniq_frequency2,uniq_frequency3
2014-04-17 14:45:27,230 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: uniq_frequency1[57,18] C: R: uniq_frequency2[61,18],filtered_uniq_frequency[67,26],uniq_frequency3[64,18]
2014-04-17 14:46:20,528 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 42% complete
2014-04-17 14:46:24,652 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2014-04-17 14:46:56,452 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 56% complete
2014-04-17 14:47:11,699 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2014-04-17 14:47:13,946 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-04-17 14:47:13,953 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-04-17 14:47:13,957 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2014-04-17 14:47:13,975 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2014-04-17 14:47:14,048 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=542362
2014-04-17 14:47:14,051 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2014-04-17 14:47:14,423 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7400333801906397044.jar
2014-04-17 14:47:20,255 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7400333801906397044.jar created
2014-04-17 14:47:20,272 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-04-17 14:47:20,276 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2014-04-17 14:47:20,277 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2014-04-17 14:47:20,278 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2014-04-17 14:47:20,355 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-04-17 14:47:20,381 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:47:20,397 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 14:47:20,400 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 14:47:20,401 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 14:47:20,402 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class
2014-04-17 14:47:21,172 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-04-17 14:47:21,173 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-04-17 14:47:21,174 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2014-04-17 14:47:21,321 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2014-04-17 14:47:21,419 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1397713701605_0005
2014-04-17 14:47:21,481 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1397713701605_0005 to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:47:21,496 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master:8088/proxy/application_1397713701605_0005/
2014-04-17 14:47:21,497 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1397713701605_0005
2014-04-17 14:47:21,498 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases ordered_uniq_frequency
2014-04-17 14:47:21,498 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: ordered_uniq_frequency[70,25] C: R:
2014-04-17 14:48:04,061 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 70% complete
2014-04-17 14:48:33,152 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 80% complete
2014-04-17 14:48:38,720 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-04-17 14:48:38,724 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-04-17 14:48:38,725 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2014-04-17 14:48:38,726 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2014-04-17 14:48:38,814 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4106380232689934014.jar
2014-04-17 14:48:42,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4106380232689934014.jar created
2014-04-17 14:48:42,405 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-04-17 14:48:42,407 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2014-04-17 14:48:42,408 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2014-04-17 14:48:42,409 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2014-04-17 14:48:42,471 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-04-17 14:48:42,477 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:48:42,487 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-04-17 14:48:42,489 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-17 14:48:42,490 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-17 14:48:43,094 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-04-17 14:48:43,095 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-04-17 14:48:43,096 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2014-04-17 14:48:43,252 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2014-04-17 14:48:43,427 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1397713701605_0006
2014-04-17 14:48:43,507 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1397713701605_0006 to ResourceManager at master/192.168.171.132:8032
2014-04-17 14:48:43,512 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master:8088/proxy/application_1397713701605_0006/
2014-04-17 14:48:43,513 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1397713701605_0006
2014-04-17 14:48:43,513 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases ordered_uniq_frequency
2014-04-17 14:48:43,514 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: ordered_uniq_frequency[70,25] C: R:
2014-04-17 14:49:38,780 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 90% complete
2014-04-17 14:50:08,676 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-04-17 14:50:08,727 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.2.0 0.12.2-SNAPSHOT scott 2014-04-17 14:35:18 2014-04-17 14:50:08 GROUP_BY,ORDER_BY,DISTINCT,FILTER

Success!

Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias FeatureOutputs
job_1397713701605_0002 1 1 387 387 387 387 36 36 36 36 clean1,clean2,houred,ngramed1,raw DISTINCT
job_1397713701605_0003 1 1 56 56 56 56 29 29 29 29 hour_frequency1,hour_frequency2 GROUP_BY,COMBINER
job_1397713701605_0004 1 1 39 39 39 39 30 30 30 30 filtered_uniq_frequency,uniq_frequency1,uniq_frequency2,uniq_frequency3GROUP_BY
job_1397713701605_0005 1 1 27 27 27 27 24 24 24 24 ordered_uniq_frequency SAMPLER
job_1397713701605_0006 1 1 39 39 39 39 20 20 20 20 ordered_uniq_frequency ORDER_BY hdfs://master:9000/user/scott/script1-hadoop-results,

Input(s):
Successfully read 944954 records (10409080 bytes) from: "hdfs://master:9000/user/scott/excite.log.bz2"

Output(s):
Successfully stored 13530 records (659954 bytes) in: "hdfs://master:9000/user/scott/script1-hadoop-results"

Counters:
Total records written : 13530
Total bytes written : 659954
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1397713701605_0002 -> job_1397713701605_0003,
job_1397713701605_0003 -> job_1397713701605_0004,
job_1397713701605_0004 -> job_1397713701605_0005,
job_1397713701605_0005 -> job_1397713701605_0006,
job_1397713701605_0006


2014-04-17 14:50:09,739 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:51352. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:10,742 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:51352. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:11,747 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:51352. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:11,859 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2014-04-17 14:50:14,166 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:40980. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:15,168 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:40980. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:16,171 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:40980. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:16,277 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2014-04-17 14:50:17,680 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:46125. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:18,684 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:46125. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:19,687 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:46125. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:19,793 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2014-04-17 14:50:21,065 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:56551. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:22,068 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:56551. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:23,071 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: slave/192.168.171.131:56551. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2014-04-17 14:50:23,175 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2014-04-17 14:50:23,585 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

在浏览器上查看运行情况

http://master:8088/cluster

运行结果

scott@master:/opt/pig-0.12.1/pigtmp$ hdfs dfs -ls script1-hadoop-results
Found 2 items
-rw-r--r-- 1 scott supergroup 0 2014-04-17 14:50 script1-hadoop-results/_SUCCESS
-rw-r--r-- 1 scott supergroup 659954 2014-04-17 14:50 script1-hadoop-results/part-r-00000
scott@master:/opt/pig-0.12.1/pigtmp$ hdfs dfs -cat script1-hadoop-results/* | more
00 and shareware 2.112885636821291 3 1.5294117647058825
00 vcd 2.1213203435596424 3 1.5
00 bluebird 2.1555530241167826 4 1.9285714285714282
00 cute 2.1650635094610955 4 1.8571428571428574
00 chested 2.182820625326997 4 1.75
00 diablo cheats 2.197401062294143 3 1.4705882352941178
00 psygnosis 2.23606797749979 2 1.1666666666666667
00 gif s 2.2360679774997902 2 1.1666666666666665
00 vacancy 2.2360679774997902 2 1.1666666666666665
00 free mpegs 2.2360679774997902 2 1.1666666666666665
00 video camera 2.2360679774997902 2 1.1666666666666665
00 morisette 2.2360679774997902 2 1.1666666666666665
00 labyrinth 2.2360679774997902 2 1.1666666666666665
00 lax 2.2360679774997902 2 1.1666666666666665
00 and go 2.2360679774997902 2 1.1666666666666665
00 universidade 2.2360679774997902 2 1.1666666666666665
00 ecg 2.2360679774997902 2 1.1666666666666665
00 ywam 2.2360679774997902 2 1.1666666666666665
00 pennywise 2.2360679774997902 2 1.1666666666666665
00 depp 2.236067977499791 2 1.1666666666666665
00 foot fetish 2.250167329788653 6 2.272727272727272
00 hong 2.260186147283002 15 8.541666666666666
00 mib 2.279211529192758 4 1.7777777777777781
00 heathrow 2.291287847477921 3 1.5999999999999999
00 of hawaii 2.291287847477921 3 1.5999999999999999
00 migration 2.2941573387056184 3 1.571428571428571
00 virgin atlantic 2.3237900077244507 3 1.5
00 cop 2.3452078799117158 2 1.1538461538461535
00 klse 2.353393621658208 4 1.6000000000000003
00 cuteftp 2.357475833957224 3 1.4545454545454544
00 alamak 2.36524958395633 8 2.782608695652174
00 gauge 2.4494897427831788 2 1.1428571428571426
00 alexandra 2.4494897427831788 2 1.1428571428571426
00 puma 2.4494897427831788 2 1.1428571428571426
00 and systems 2.4494897427831788 2 1.1428571428571426
00 and roses 2.4494897427831788 2 1.1428571428571426
00 next door 2.4494897427831788 2 1.1428571428571426
00 kong 2.462172837549634 16 8.833333333333332
00 hong kong 2.4890695896451582 15 8.25
00 holes 2.5649458802128855 3 1.4375
00 spells 2.618614682831909 3 1.4

参考文章

http://pig.apache.org/docs/r0.12.1/start.html