scala - How to use Spark's .newAPIHadoopRDD() from Java -
i trying port example written in scala (from apache spark project) java, , running issues.
the code
val casrdd = sc.newapihadooprdd(job.getconfiguration(), classof[cqlpaginginputformat], classof[java.util.map[string,bytebuffer]], classof[java.util.map[string,bytebuffer]])
from original scala example builds , runs fine, but
javapairrdd rdd = jsc.newapihadooprdd(job.getconfiguration(), cqlpaginginputformat.class, java.util.map<string, bytebuffer>.class, java.util.map<string, bytebuffer>.class);
is not allowed in java (cannot select parameterized type
).
changing
java.util.map<string, bytebuffer>.class
into
class.forname("java.util.map<string, bytebuffer>")
yields new error:
error:(42, 30) java: method newapihadooprdd in class org.apache.spark.api.java.javasparkcontext cannot applied given types; required: org.apache.hadoop.conf.configuration,java.lang.class<f>,java.lang.class<k>,java.lang.class<v> found: org.apache.hadoop.conf.configuration,java.lang.class<org.apache.cassandra.hadoop.cql3.cqlpaginginputformat>,java.lang.class<capture#1 of ?>,java.lang.class<capture#2 of ?> reason: inferred type not conform declared bound(s) inferred: org.apache.cassandra.hadoop.cql3.cqlpaginginputformat bound(s): org.apache.hadoop.mapreduce.inputformat<capture#1 of ?,capture#2 of ?>
changing java.util.map.class
yields similar error:
error:(44, 30) java: method newapihadooprdd in class org.apache.spark.api.java.javasparkcontext cannot applied given types; required: org.apache.hadoop.conf.configuration,java.lang.class<f>,java.lang.class<k>,java.lang.class<v> found: org.apache.hadoop.conf.configuration,java.lang.class<org.apache.cassandra.hadoop.cql3.cqlpaginginputformat>,java.lang.class<java.util.map>,java.lang.class<java.util.map> reason: inferred type not conform declared bound(s) inferred: org.apache.cassandra.hadoop.cql3.cqlpaginginputformat bound(s): org.apache.hadoop.mapreduce.inputformat<java.util.map,java.util.map>
so correct translation? worth noting newapihadooprdd()
function different implementation scala , java. documentation methods can found here scala , here: http://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/javasparkcontext.html#newapihadooprdd(org.apache.hadoop.conf.configuration, java.lang.class, java.lang.class, java.lang.class) java.
the declaration of cqlpaginginputformat
looks this
public class cqlpaginginputformat extends org.apache.cassandra.hadoop.abstractcolumnfamilyinputformat<java.util.map<java.lang.string,java.nio.bytebuffer>,java.util.map<java.lang.string,java.nio.bytebuffer>> {
finally got resolved after fight. problem newhadoopapi requires class extends org.apache.hadoop.mapreduce.inputformat , org.apache.cassandra.hadoop.cql3.cqlinputformat not extend inputformat directly, instead extends org.apache.cassandra.hadoop.abstractcolumnfamilyinputformat in turn extends inputformat.
eclipse uses groovy compiler smart enough resolve java's default compiler unable resolve this. groovy compiler resolves k,v values java compiler finds incompatible.
you need add following changes pom.xml file use groovy compiler:
<properties> <groovy-version>1.8.6</groovy-version> <maven-comipler-plugin-version>2.5.1</maven-comipler-plugin-version> <groovy-eclipse-compiler-version>2.7.0-01</groovy-eclipse-compiler-version> <maven-clover2-plugin-version>3.1.7</maven-clover2-plugin-version> <groovy-eclipse-batch-version>1.8.6-01</groovy-eclipse-batch-version> </properties>
add groovy dependency
<dependencies> <dependency> <groupid>org.codehaus.groovy</groupid> <artifactid>groovy-all</artifactid> <version>${groovy-version}</version> </dependency> <dependencies>
add grovvy plugin under build use compiler our code
<build> <pluginmanagement> <plugins> <plugin> <groupid>org.apache.maven.plugins</groupid> <artifactid>maven-compiler-plugin</artifactid> <version>${maven-comipler-plugin-version}</version> <configuration> <!-- bind groovy eclipse compiler --> <compilerid>groovy-eclipse-compiler</compilerid> <source>${jdk-version}</source> <target>${jdk-version}</target> </configuration> <dependencies> <!-- define groovy version used build (default 2.0) --> <dependency> <groupid>org.codehaus.groovy</groupid> <artifactid>groovy-eclipse-batch</artifactid> <version>${groovy-eclipse-batch-version}</version> </dependency> <!-- define dependency groovy eclipse compiler (as it's referred in compilerid) --> <dependency> <groupid>org.codehaus.groovy</groupid> <artifactid>groovy-eclipse-compiler</artifactid> <version>${groovy-eclipse-compiler-version}</version> </dependency> </dependencies> </plugin> <!-- define groovy eclipse compiler again , set extensions=true. this, plugin --> <!-- enhance default build life cycle phase adds additional groovy source folders --> <!-- works fine under maven 3.x, we've encountered problems maven 2.x --> <plugin> <groupid>org.codehaus.groovy</groupid> <artifactid>groovy-eclipse-compiler</artifactid> <version>${groovy-eclipse-compiler-version}</version> <extensions>true</extensions> </plugin> <!-- configure clover maven plug-in. please note it's not bound execution phase, --> <!-- you'll have call clover goals command line. --> <plugin> <groupid>com.atlassian.maven.plugins</groupid> <artifactid>maven-clover2-plugin</artifactid> <version>${maven-clover2-plugin-version}</version> <configuration> <generatehtml>true</generatehtml> <historydir>.cloverhistory</historydir> </configuration> </plugin> </plugins> </pluginmanagement> </build>
this should solve it.
Comments
Post a Comment