Data Platform Customer Programs Home
Apache PIG can only register Jars using local path on the head node
4/3/2013 4:17:28 AM
User(s) can reproduce this bug
When attempting to register a jar residing on hdfs or asv PIG throws an error e.g
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: Local file 'hdfs://namenodehost:9000/Jar/piggybank-jd.jar' does not exist.
Steps to Reproduce
from a grunt prompt attempt to register a JAR which resides on hdfs or asv
grunt> REGISTER asv://email@example.com/path/to.jar
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: Local file 'asv://firstname.lastname@example.org/path/to.jar' does not exist.
Jar is registered successfully.
to post a comment.
Please enter a comment.
on 5/17/2013 at 11:48 AM
Hi John, I was unable to reproduce this on a HDInsight cluster that I created. Here's what I did:
1. Copy a local jar into asv using hadoop command-line:
hadoop fs -copyFromLocal D:\temp\myudfs.jar asv://<container name>@<account name>.blob.core.windows.net/myudfs.jar
2. Register the jar in pig:
REGISTER 'asv://<container name>@<account name>.blob.core.windows.net/myudfs.jar'
3. Run a pig query using the jar:
A = LOAD '/user/test.log' USING PigStorage('\t') AS (name: chararray, number: int);
B = FOREACH A GENERATE myudfs.UPPER(name);
Notice that I put the jar in ASV, not HDFS. When pig starts, you can see that the file system that is loaded is ASV, not HDFS:
2013-05-17 18:31:45,730 [main] INFO org.apache.pig.Main - Logging error messages to: C:\apps\dist\hadoop-1.1.0-SNAPSHOT\logs\pig_1368815505728.log
2013-05-17 18:31:46,021 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: asv://<container name>@<account name>.blob.core.windows.net
Given that Azure VMs can be reimaged/migrated, HDFS is better used for temporary scratch space used internally by Hadoop components and ASV should be used for persistent storage. Please let me know if you have any other questions.
to post a workaround.
Please enter a workaround.
on 4/3/2013 at 4:22 AM
use fs -copyToLocal to copy a file from remote filesystem to local storage and then use the local path in your script.
© 2013 Microsoft