Wednesday, May 6, 2015

Parsing a CSV file having "hyphen" separated column value.

One thing I have learnt from past a year or so. If you say you want to know Big Data Analytic, the first thing you should learn is how to parse a CSV ( Comma Separated Value) file.

In this blog, I will explain the how to parse a CSV file having "Hyphen" separated column value. These things are actually asked in the interviews and sometime becomes too tricky.

I will use Java to parse the same.

Here is one file containing various attributes, column and different statistics of Cricket players of various countries. csi-batting.csv

The Question is:

Find the total score of the Afghanistan players in the year of 2010.

For our convenience I will show first few lines of the CSV file here.

** Each paragraph here contains each line in the CSV File"

Afghanistan,Mohammad Shahzad,118,97.52,16-02-2010,Tue,Sharjah CA Stadium,Canada,../Matches/MatchScorecard_ODI.asp?MatchCode=3087

Afghanistan,Mohammad Shahzad,110,99.09,01-09-2009,Tue,VRA Ground,Netherlands,../Matches/MatchScorecard_ODI.asp?MatchCode=3008

Afghanistan,Mohammad Shahzad,100,138.88,16-08-2010,Mon,Cambusdoon New Ground,Scotland,../Matches/MatchScorecard_ODI.asp?MatchCode=3164

Afghanistan,Mohammad Shahzad,82,75.92,10-07-2010,Sat,Hazelaarweg,Netherlands,../Matches/MatchScorecard_ODI.asp?MatchCode=3153

Afghanistan,Mohammad Shahzad,57,100,01-07-2010,Thu,Sportpark Westvliet,Canada,../Matches/MatchScorecard_ODI.asp?MatchCode=3135

Here you will find that, the date column (MatchDate) of the CSV file is hyphen separated. So, the usual to split the columns with (",") will NOT work here.

What can we do then?

Split the columns with (",") and get those in seprated variable and then split that particular MatchDate column with ("-").

Lets see the code here.
© Dipayan Dev


public class A {

public static void main(String args[]) throws FileNotFoundException

/*Put that file according to your wish and change the string */

String csv="C:\\Users\\Dipayan\\Desktop\\odi-batting.csv";  
BufferedReader br=new BufferedReader(new FileReader(csv));

String line=" ";
int sum=0;
int count=0;
int []a=new int[10000];

try {
} catch (IOException e) {
// TODO Auto-generated catch block
try {

String [] f= line.split(","); /* Splitting each column and storing each of them in array f*/ 
String con=f[0];
String date = f[4];
String year = date.split("-")[2]; /* Split the second column using hyphen*/
if (year.equals("2010") && con.equals("Afghanistan")) {
   a[count] = Integer.parseInt(f[2]);
   sum += a[count];
} catch (NumberFormatException | IOException e) {
// TODO Auto-generated catch block


No comments: