Reads comma-delimited text (or text delimited by some other character) from a file and returns it as a query for your use in ColdFusion templates. Designed to be somewhat more forgiving about missing delimiters, etc., than ODBC datasources, etc. Intelligently handles quote marks as text qualifiers around individual values, including embedded quotes (or you can use some other character as the qualifier). Will use column names from the first line of the file, or you can supply the column names yourself. You can specify what line number to start at, and how many lines to read at a time. You can specify how "bad rows" should be handled.
Note: This tag was compiled using Delphi 4 and Vadim's DelphiCFX package. If you enjoy the tag's performance, consider trying these tools out yourself, or buy me a little gift of some kind, or both! :) -nate
16-Oct-98
Fixed
bug in BADROWSARE="LONG". Added COLCOUNTS parameter. Added DEBUG
output. Also, tag should no longer keep a lock on a file if an error is
encountered during processing.
Parameters (all are optional except FILE)
|
FILE |
The full filesystem-style path to the text file. |
|
NAME |
The name of the query result set to put the file's contents into. So if NAME="MyQuery", then you could use QUERY= "MyQuery" in a CFOUTPUT tag to display the records. If you don't provide a NAME, no query will be created; however, the four variables listed below will still be set. This allowsyou to "pre-scan" a file before processing it to see if it contains bad rows. |
|
COLUMNS |
If provided, a comma-separated list of column names. These will be the columns in the query result set named by the NAME parameter. If not provided, the column names will be read from the first line of the file. The column names will be forced to be legal ColdFusion identifiers, by stripping illegal characters such as spaces, hyphens, etc., from the names. Column names that are made up of only illegal characters will be named with "COL" then the column number. If any column names start with numbers, "COL" will be prepended to the name to make it legal. If any column names are duplicated, underscores will be appended to the names to make them unique. So a file that has a first line like: State,Sales,Sales This Month,1998,Q2,1999,Q2,$$$ would be treated as if it had been: State,Sales,SalesThisMonth,COL1998,Q2,COL1999,Q2_,COL8 |
|
DELIMITER |
The character to be used as the delimiter. Defaults to a comma. |
|
TEXTQUALIFIER |
The character to be used as the text qualifier. Defaults to a double-quote mark. |
|
LONGROWSARE |
GOOD or BAD. Defaults to GOOD. As each line is reach from the file, the number of columns (fields) in the line is counted.
|
|
ONBADROW |
ERROR, STOP, SKIP, or USEANYWAY. Defaults to ERROR. Determines what the tag does if a "bad" row is found (see above for definition of a bad row). Note that no matter what this parameter is set to, the line number of the row in the text file is added to the list of bad row numbers in the #CSV_BadRows# variable that gets set by the tag automatically. ERROR: All template execution stops and an error message is displayed. If you have previously defined a custom error template with the CFERROR tag, that will be used.
|
After the tag executes, the following variables will be populated for you:
Advanced Parameters
The tag
also supports several additional parameters that can be used to read in a
certain number of rows at a time, or to start at a specific row in the file. If
you just want to read the whole file in at once, you don't need to read about
these parameters.
|
STARTROW |
Defaults to 1. The line number to start at. So if STARTROW= "10",the data in the 10th line of the file will be in the first row of the returned query. If the column names are being read in from the first line of the file, it doesn't count here (so the data in the 11th line of the file will be in the first row). |
|
MAXROWS |
If provided, provides the maximum number of lines that should be read in from the file. Useful if you have a very large file and don't want to read in the whole thing at once because it would take up too much of your system's resources. See example below. |
|
ROWNUMBERS |
If provided, an additional column will be added to each row of the query, which indicates the original row number that the data occupied in the file. So if ROWNUMBERS="LINENUM", there will be an extra column in the returned query, named "LINENUM". Also helpful if there are bad lines in the file and ONBADROW= "SKIP".The skipped row numbers will not be present in this column. Thus, #ValueList(MyQuery.LineNum)# would contain a comma-separated list of the row numbers that were "good". |
|
COLCOUNTS |
If provided, an additional column will be added to each row of the query, which indicates the number of "columns" (fields) that were found in the corresponding line of the text file. Useful if you have instructed the tag to imports all lines from the file, even if they contain too much data (by setting ONBADROW to USEANYWAY and LONGROWSARE to GOOD). So if you specified COLCOUNTS= "NumFields", you would be able to look at #MyQuery.NumFields# to determine if any excess data hadbeen truncated from what was contained in that line of the text file. |
|
DEBUG |
If you add a DEBUG parameter to the tag, you will see output in your browser that shows each line that the tag is working on and the number of fields (columns) in each line. Usefulif you think that the tag is getting "caught" on a particular line. |
The following variables are also set for you when a MAXROWS parameter has been provided (actually, they are set for you even if it's not provided, but they aren't really relevant unless MAXROWS is being used). Together with the above parameters, they can be used to get a "chunking" effect, as shown in the example below.
Example
The following example demonstrates how to use STARTROW/MAXROWS and related variables to create a "data chunking" loop. The file is read in 5-line chunks. Each chunk of lines can be processed or output in whatever way you want, and the loop will execute as many times as needed until the end of the file is reached. Note that while this tag is compatible with ColdFusion 2.0, this template wouldn't work because it would result in multiple queries with the same name, which was not allowed in 2.0.
<CFSET CSV_StartRow = 1>
<CFSET CSV_ReachedEOF = "No">
<CFLOOP CONDITION="NOT CSV_ReachedEOF">
<CFX_CSVToQuery
FILE="c:\whatever\Comma.txt"
NAME="MyQuery"
COLUMNS="FirstName,LastName"
ROWNUMBERS="LineNum"
MAXROWS="5"
STARTROW="#CSV_StartRow#"
LONGROWSARE="BAD"
ONBADROW="SKIP">
<CFOUTPUT>
<P>Rows Read: #CSV_RowsProcessed#<BR>
Bad Rows: #CSV_BadRows#<BR>
ReachedEOF: #CSV_ReachedEOF#<BR>
StartRow: #CSV_StartRow#<BR>
</CFOUTPUT>
<CFOUTPUT QUERY="MyQuery">
<LI> #LineNum#: #LastName#,#FirstName#
</CFOUTPUT>
</CFLOOP>
This would cause something like the following to be displayed:
Rows Read: 5
Bad Rows:
ReachedEOF: NO
StartRow: 6
Rows Read: 5
Bad Rows: 9,11
ReachedEOF: NO
StartRow: 13
Rows Read: 5
Bad Rows:
ReachedEOF: NO
StartRow: 18
Rows Read: 2
Bad Rows:
ReachedEOF: YES
StartRow: 20
The file used to generate the above sample output was:
Nate,Weiss Jed,Freeman Jason,Stover Stephanie,Hawkins Mike,Dinowitz Ned,Higgins Erica,Keating Ana,Pozo badboy Kate,Winslet Mira,Sorvino,Babe Albert,Brooks Jack,Lemmon Tony,Randal Bill,Gates Jeremy,Allaire Sim,Simeonov Paine,Webber Gator,Ade