Class ToCSV
- java.lang.Object
-
- org.apache.poi.examples.ss.ToCSV
-
public class ToCSV extends Object
Demonstrates one way to convert an Excel spreadsheet into a CSV file. This class makes the following assumptions;- 1. Where the Excel workbook contains more than one worksheet, then a single CSV file will contain the data from all of the worksheets.
- 2. The data matrix contained in the CSV file will be square. This means that the number of fields in each record of the CSV file will match the number of cells in the longest row found in the Excel workbook. Any short records will be 'padded' with empty fields - an empty field is represented in the CSV file in this way - ,,.
- 3. Empty fields will represent missing cells.
- 4. A record consisting of empty fields will be used to represent an empty row in the Excel workbook.
___________________________________________ | | | | | | | A | B | C | D | E | ___|_______|_______|_______|_______|_______| | | | | | | 1 | 1 | 2 | 3 | 4 | 5 | ___|_______|_______|_______|_______|_______| | | | | | | 2 | | | | | | ___|_______|_______|_______|_______|_______| | | | | | | 3 | | A | | B | | ___|_______|_______|_______|_______|_______| | | | | | | 4 | | | | | Z | ___|_______|_______|_______|_______|_______| | | | | | | 5 | 1,400 | | 250 | | | ___|_______|_______|_______|_______|_______|Then, the resulting CSV file will contain the following lines (records);1,2,3,4,5 ,,,, ,A,,B, ,,,,Z "1,400",,250,,
Typically, the comma is used to separate each of the fields that, together, constitute a single record or line within the CSV file. This is not however a hard and fast rule and so this class allows the user to determine which character is used as the field separator and assumes the comma if none other is specified.
If a field contains the separator then it will be escaped. If the file should obey Excel's CSV formatting rules, then the field will be surrounded with speech marks whilst if it should obey UNIX conventions, each occurrence of the separator will be preceded by the backslash character.
If a field contains an end of line (EOL) character then it too will be escaped. If the file should obey Excel's CSV formatting rules then the field will again be surrounded by speech marks. On the other hand, if the file should follow UNIX conventions then a single backslash will precede the EOL character. There is no single applicable standard for UNIX and some applications replace the CR with \r and the LF with \n but this class will not do so.
If the field contains double quotes then that character will be escaped. It seems as though UNIX does not define a standard for this whilst Excel does. Should the CSV file have to obey Excel's formatting rules then the speech mark character will be escaped with a second set of speech marks. Finally, an enclosing set of speech marks will also surround the entire field. Thus, if the following line of text appeared in a cell - "Hello" he said - it would look like this when converted into a field within a CSV file - """Hello"" he said".
Finally, it is worth noting that talk of CSV 'standards' is really slightly misleading as there is no such thing. It may well be that the code in this class has to be modified to produce files to suit a specific application or requirement.
-
-
Field Summary
Fields Modifier and Type Field Description static intEXCEL_STYLE_ESCAPINGIdentifies that the CSV file should obey Excel's formatting conventions with regard to escaping certain embedded characters - the field separator, speech mark and end of line (EOL) characterstatic intUNIX_STYLE_ESCAPINGIdentifies that the CSV file should obey UNIX formatting conventions with regard to escaping certain embedded characters - the field separator and end of line (EOL) character
-
Constructor Summary
Constructors Constructor Description ToCSV()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidconvertExcelToCSV(String strSource, String strDestination)Process the contents of a folder, convert the contents of each Excel workbook into CSV format and save the resulting file to the specified folder using the same name as the original workbook with the .xls or .xlsx extension replaced by .csv.voidconvertExcelToCSV(String strSource, String strDestination, String separator)Process the contents of a folder, convert the contents of each Excel workbook into CSV format and save the resulting file to the specified folder using the same name as the original workbook with the .xls or .xlsx extension replaced by .csv.voidconvertExcelToCSV(String strSource, String strDestination, String separator, int formattingConvention)Process the contents of a folder, convert the contents of each Excel workbook into CSV format and save the resulting file to the specified folder using the same name as the original workbook with the .xls or .xlsx extension replaced by .csvstatic voidmain(String[] args)The main() method contains code that demonstrates how to use the class.
-
-
-
Field Detail
-
EXCEL_STYLE_ESCAPING
public static final int EXCEL_STYLE_ESCAPING
Identifies that the CSV file should obey Excel's formatting conventions with regard to escaping certain embedded characters - the field separator, speech mark and end of line (EOL) character- See Also:
- Constant Field Values
-
UNIX_STYLE_ESCAPING
public static final int UNIX_STYLE_ESCAPING
Identifies that the CSV file should obey UNIX formatting conventions with regard to escaping certain embedded characters - the field separator and end of line (EOL) character- See Also:
- Constant Field Values
-
-
Method Detail
-
convertExcelToCSV
public void convertExcelToCSV(String strSource, String strDestination) throws FileNotFoundException, IOException, IllegalArgumentException
Process the contents of a folder, convert the contents of each Excel workbook into CSV format and save the resulting file to the specified folder using the same name as the original workbook with the .xls or .xlsx extension replaced by .csv. This method will ensure that the CSV file created contains the comma field separator and that embedded characters such as the field separator, the EOL and double quotes are escaped in accordance with Excel's convention.- Parameters:
strSource- An instance of the String class that encapsulates the name of and path to either a folder containing those Excel workbook(s) or the name of and path to an individual Excel workbook that is/are to be converted.strDestination- An instance of the String class encapsulating the name of and path to a folder that will contain the resulting CSV files.- Throws:
FileNotFoundException- Thrown if any file cannot be located on the filesystem during processing.IOException- Thrown if the filesystem encounters any problems during processing.IllegalArgumentException- Thrown if the values passed to the strSource parameter refers to a file or folder that does not exist or if the value passed to the strDestination paramater refers to a folder that does not exist or simply does not refer to a folder.
-
convertExcelToCSV
public void convertExcelToCSV(String strSource, String strDestination, String separator) throws FileNotFoundException, IOException, IllegalArgumentException
Process the contents of a folder, convert the contents of each Excel workbook into CSV format and save the resulting file to the specified folder using the same name as the original workbook with the .xls or .xlsx extension replaced by .csv. This method allows the client to define the field separator but will ensure that embedded characters such as the field separator, the EOL and double quotes are escaped in accordance with Excel's convention.- Parameters:
strSource- An instance of the String class that encapsulates the name of and path to either a folder containing those Excel workbook(s) or the name of and path to an individual Excel workbook that is/are to be converted.strDestination- An instance of the String class encapsulating the name of and path to a folder that will contain the resulting CSV files.separator- An instance of the String class that encapsulates the character or characters the client wishes to use as the field separator.- Throws:
FileNotFoundException- Thrown if any file cannot be located on the filesystem during processing.IOException- Thrown if the filesystem encounters any problems during processing.IllegalArgumentException- Thrown if the values passed to the strSource parameter refers to a file or folder that does not exist or if the value passed to the strDestination paramater refers to a folder that does not exist or simply does not refer to a folder.
-
convertExcelToCSV
public void convertExcelToCSV(String strSource, String strDestination, String separator, int formattingConvention) throws FileNotFoundException, IOException, IllegalArgumentException
Process the contents of a folder, convert the contents of each Excel workbook into CSV format and save the resulting file to the specified folder using the same name as the original workbook with the .xls or .xlsx extension replaced by .csv- Parameters:
strSource- An instance of the String class that encapsulates the name of and path to either a folder containing those Excel workbook(s) or the name of and path to an individual Excel workbook that is/are to be converted.strDestination- An instance of the String class encapsulating the name of and path to a folder that will contain the resulting CSV files.formattingConvention- A primitive int whose value will determine whether certain embedded characters should be escaped in accordance with Excel's or UNIX formatting conventions. Two constants are defined to support this option; ToCSV.EXCEL_STYLE_ESCAPING and ToCSV.UNIX_STYLE_ESCAPINGseparator- An instance of the String class encapsulating the characters or characters that should be used to separate items on a line within the CSV file.- Throws:
FileNotFoundException- Thrown if any file cannot be located on the filesystem during processing.IOException- Thrown if the filesystem encounters any problems during processing.IllegalArgumentException- Thrown if the values passed to the strSource parameter refers to a file or folder that does not exist, if the value passed to the strDestination paramater refers to a folder that does not exist, if the value passed to the strDestination parameter does not refer to a folder or if the value passed to the formattingConvention parameter is other than one of the values defined by the constants ToCSV.EXCEL_STYLE_ESCAPING and ToCSV.UNIX_STYLE_ESCAPING.
-
main
public static void main(String[] args)
The main() method contains code that demonstrates how to use the class.- Parameters:
args- An array containing zero, one or more elements all of type String. Each element will encapsulate an argument specified by the user when running the program from the command prompt.
-
-