使用Eclipse做遠端叢集的操作!~

首先必須先載下Eclipse (http://www.eclipse.org/downloads/)，我則是選擇Eclipse IDE for Java EE Developers來使用。我是在window 7的系統下安裝Eclipse(for widow的版本)，而在叢集上的某一台也安裝Eclipse(for Linux的版本)，廢話不多說請照下面的步驟進行安裝。

(1). Build出hadoop-eclipse-plugin-1.0.3.jar並放置於window下，安裝eclipse的路徑中的plugins內，for ex: C:\eclipse\plugins\

build的方式如下所示 :

a. http://forum.hadoop.tw/viewtopic.php?f=4&t=36087 Jazz大大有教學。

b. 但小弟最後還在參考http://rritw.com/a/bianchengyuyan/C__/20120708/182732.html的方式於CentOS的環境下來做plugins包。

但重點來了...千萬不要於已經在run的叢集中build這個包，因為它會compile hadoop(會導致此台的hadoop版本升級，如此一來會與其他node的版本不一致，到時會很麻煩...親身經歷orz....)，故請"拿一包新的hadoop包(btw. 既然要做1.0.3的jar所以請下載Haodoop 1.0.3版，其他同理!!)"來做。

(2). 放置好hadoop-eclipse-plugin-1.0.3.jar後，請先將遠端叢集的HDFS開啟，然後再開啟window下的eclipse，接下來就請參照(http://developer.51cto.com/art/201207/345690_1.htm，在第2頁)的方式來設定即可!~內有常見問題的FAQ我先列出幾點...因為一定會遇到(^.^~)

a. error: failure to login

b. Permission denied

c. Failed to set permissions of path

d. hadoop mapred執行目錄權限問題

e. 在 run mapreduce 時出現ClassNotFound的問題(這個問題比較棘手，請看下面解決方法)

====先加入EJob class====

import java.io.File;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.IOException;

import java.net.URL;

import java.net.URLClassLoader;

import java.util.ArrayList;

import java.util.List;

import java.util.jar.JarEntry;

import java.util.jar.JarOutputStream;

import java.util.jar.Manifest;

public class EJob{

// To declare global field

private static List<URL> classPath = new ArrayList<URL>();

// To declare method

public static File createTempJar(String root) throws IOException {

if (!new File(root).exists()) {

return null;

}

Manifest manifest = new Manifest();

manifest.getMainAttributes().putValue("Manifest-Version", "1.0");

final File jarFile = File.createTempFile("EJob-", ".jar", new File(

System.getProperty("java.io.tmpdir")));

Runtime.getRuntime().addShutdownHook(new Thread() {

public void run() {

jarFile.delete();

}

});

JarOutputStream out = new JarOutputStream(

new FileOutputStream(jarFile), manifest);

createTempJarInner(out, new File(root), "");

out.flush();

out.close();

return jarFile;

}

private static void createTempJarInner(JarOutputStream out, File f,

String base) throws IOException {

if (f.isDirectory()) {

File[] fl = f.listFiles();

if (base.length() > 0) {

base = base + "/";

}

for (int i = 0; i < fl.length; i++) {

createTempJarInner(out, fl[i], base + fl[i].getName());

}

} else {

out.putNextEntry(new JarEntry(base));

FileInputStream in = new FileInputStream(f);

byte[] buffer = new byte[1024];

int n = in.read(buffer);

while (n != -1) {

out.write(buffer, 0, n);

n = in.read(buffer);

}

in.close();

}

public static ClassLoader getClassLoader() {

ClassLoader parent = Thread.currentThread().getContextClassLoader();

if (parent == null) {

parent = EJob.class.getClassLoader();

}

if (parent == null) {

parent = ClassLoader.getSystemClassLoader();

}

return new URLClassLoader(classPath.toArray(new URL[0]), parent);

}

public static void addClasspath(String component) {

if ((component != null) && (component.length() > 0)) {

try {

File f = new File(component);

if (f.exists()) {

URL key = f.getCanonicalFile().toURL();

if (!classPath.contains(key)) {

classPath.add(key);

}

} catch (IOException e) {

}

====在 mapreduce 的 main function下加入**** ****包起來的部分，其他只是一些mapreduce所需的而已，要使用時請把****拿掉====

public static void main(String[] args) throws Exception {

***** File jarFile = EJob.createTempJar("bin"); *****
***** EJob.addClasspath("/usr/hadoop/conf"); *****
***** ClassLoader classLoader = EJob.getClassLoader(); *****
***** Thread.currentThread().setContextClassLoader(classLoader); *****

Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount ");
System.exit(2);
}
Job job = new Job(conf, "word count");
***** ((JobConf) job.getConfiguration()).setJar(jarFile.toString()); *****
job.setJarByClass(Practice.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}

排版的部分請見諒，可以使用Eclipse( Ctrl + Shift + F )來做排版。

上述ClassNotFound的原因則是因為使用window下的eclipse來執行mapreduce的動作，而導致沒有所需的jar包(通常都是MapperClass的包找不到...但這也是因為先執行map的關係!)，但是若直接把build出來的jar包直接放到叢集上使用"hadoop jar xxxx.jar"來執行的話，就不會有上述問題，所以在EJob中的模式就是打包jar並放到遠端叢集!~真是酷斃了

TonyMoMo

TonyMoMo的部落格

TonyMoMo 發表在痞客邦留言(3) 人氣(470)

3 則留言

哈囉袁

好威喔! 有空教一下來玩玩!

2013-01-06 14:35

哈哈~好呀^_^~

2013-01-06 22:55

danielgrant

thank you very much! this bug has tormented me for a whole day! i corrected it through your method!

2013-05-11 15:38

you're welcome ^^

2013-05-12 12:22

厲害！

多謝，小弟也用這個方法成功解決自己的問題！冒昧地問一下，博主學習Java多久了？寫出EJob這個類應該要花不少時間吧？

2013-07-08 21:44

那個EJob的部分，其實在網路上已經有不少版本了，目前我已經把他改成我自己的版本，EJob這個Class可能已經流傳很久所以出處也不可考，所以我就沒貼出處了，大家都是在站巨人的肩膀上^^~，但如果現在我做的話，我應該會把要用的Jar檔先上傳到HDFS上，然後使用快取的方式來做。

2013-07-09 10:23

TonyMoMo的部落格

TonyMoMo的新視野

使用Eclipse做遠端叢集的操作!~

個人資訊

熱門文章

文章分類

最新文章

動態訂閱

文章精選

文章搜尋

誰來我家

參觀人氣