使用Eclipse做遠端叢集的操作!~－TonyMoMo的部落格

首先必須先載下Eclipse (http://www.eclipse.org/downloads/)，我則是選擇Eclipse IDE for Java EE Developers來使用。我是在window 7的系統下安裝Eclipse(for widow的版本)，而在叢集上的某一台也安裝Eclipse(for Linux的版本)，廢話不多說請照下面的步驟進行安裝。

(1). Build出hadoop-eclipse-plugin-1.0.3.jar並放置於window下，安裝eclipse的路徑中的plugins內，for ex: C:\eclipse\plugins\

build的方式如下所示 :

a. http://forum.hadoop.tw/viewtopic.php?f=4&t=36087 Jazz大大有教學。

b. 但小弟最後還在參考http://rritw.com/a/bianchengyuyan/C__/20120708/182732.html的方式於CentOS的環境下來做plugins包。

但重點來了...千萬不要於已經在run的叢集中build這個包，因為它會compile hadoop(會導致此台的hadoop版本升級，如此一來會與其他node的版本不一致，到時會很麻煩...親身經歷orz....)，故請"拿一包新的hadoop包(btw. 既然要做1.0.3的jar所以請下載Haodoop 1.0.3版，其他同理!!)"來做。

(2). 放置好hadoop-eclipse-plugin-1.0.3.jar後，請先將遠端叢集的HDFS開啟，然後再開啟window下的eclipse，接下來就請參照(http://developer.51cto.com/art/201207/345690_1.htm，在第2頁)的方式來設定即可!~內有常見問題的FAQ我先列出幾點...因為一定會遇到(^.^~)

a. error: failure to login

b. Permission denied

c. Failed to set permissions of path

d. hadoop mapred執行目錄權限問題

e. 在 run mapreduce 時出現ClassNotFound的問題(這個問題比較棘手，請看下面解決方法)

====先加入EJob class====

import java.io.File;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.IOException;

import java.net.URL;

import java.net.URLClassLoader;

import java.util.ArrayList;

import java.util.List;

import java.util.jar.JarEntry;

import java.util.jar.JarOutputStream;

import java.util.jar.Manifest;

public class EJob{

// To declare global field

private static List<URL> classPath = new ArrayList<URL>();

// To declare method

public static File createTempJar(String root) throws IOException {

if (!new File(root).exists()) {

return null;

}

Manifest manifest = new Manifest();

manifest.getMainAttributes().putValue("Manifest-Version", "1.0");

final File jarFile = File.createTempFile("EJob-", ".jar", new File(

System.getProperty("java.io.tmpdir")));

Runtime.getRuntime().addShutdownHook(new Thread() {

public void run() {

jarFile.delete();

}

});

JarOutputStream out = new JarOutputStream(

new FileOutputStream(jarFile), manifest);

createTempJarInner(out, new File(root), "");

out.flush();

out.close();

return jarFile;

}

private static void createTempJarInner(JarOutputStream out, File f,

String base) throws IOException {

if (f.isDirectory()) {

File[] fl = f.listFiles();

if (base.length() > 0) {

base = base + "/";

}

for (int i = 0; i < fl.length; i++) {

createTempJarInner(out, fl[i], base + fl[i].getName());

}

} else {

out.putNextEntry(new JarEntry(base));

FileInputStream in = new FileInputStream(f);

byte[] buffer = new byte[1024];

int n = in.read(buffer);

while (n != -1) {

out.write(buffer, 0, n);

n = in.read(buffer);

}

in.close();

}

public static ClassLoader getClassLoader() {

ClassLoader parent = Thread.currentThread().getContextClassLoader();

if (parent == null) {

parent = EJob.class.getClassLoader();

}

if (parent == null) {

parent = ClassLoader.getSystemClassLoader();

}

return new URLClassLoader(classPath.toArray(new URL[0]), parent);

}

public static void addClasspath(String component) {

if ((component != null) && (component.length() > 0)) {

try {

File f = new File(component);

if (f.exists()) {

URL key = f.getCanonicalFile().toURL();

if (!classPath.contains(key)) {

classPath.add(key);

}

} catch (IOException e) {

}

====在 mapreduce 的 main function下加入**** ****包起來的部分，其他只是一些mapreduce所需的而已，要使用時請把****拿掉====

public static void main(String[] args) throws Exception {

***** File jarFile = EJob.createTempJar("bin"); *****
***** EJob.addClasspath("/usr/hadoop/conf"); *****
***** ClassLoader classLoader = EJob.getClassLoader(); *****
***** Thread.currentThread().setContextClassLoader(classLoader); *****

Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount ");
System.exit(2);
}
Job job = new Job(conf, "word count");
***** ((JobConf) job.getConfiguration()).setJar(jarFile.toString()); *****
job.setJarByClass(Practice.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}

排版的部分請見諒，可以使用Eclipse( Ctrl + Shift + F )來做排版。

上述ClassNotFound的原因則是因為使用window下的eclipse來執行mapreduce的動作，而導致沒有所需的jar包(通常都是MapperClass的包找不到...但這也是因為先執行map的關係!)，但是若直接把build出來的jar包直接放到叢集上使用"hadoop jar xxxx.jar"來執行的話，就不會有上述問題，所以在EJob中的模式就是打包jar並放到遠端叢集!~真是酷斃了