I have a file “userRating” with following data
id , name , gender , rating
I have to create an rdd that contains user with id=1.
I am doing programming in spark using java
Better you can try filter() method to take a value of first index from CSV file
If I assume that userRating is the name of a CSV file. You can first convert the CSV to DataFrame (using https://github.com/databricks/spark-csv) and then filter on the id you are looking for.
import org.apache.spark.sql.SQLContext SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read() .format("com.databricks.spark.csv") .option("inferSchema", "true") .option("header", "true") .option("delimiter",",") .load("userRating.csv"); JavaRDD<Row> rdd = df.where(df.col("id").equalsTo(1)).javaRDD();