<CFX_CSVToQuery>

Reads comma-delimited text (or text delimited by some other character) from a file and returns it as a query for your use in ColdFusion templates. Designed to be somewhat more forgiving about missing delimiters, etc., than ODBC datasources, etc. Intelligently handles quote marks as text qualifiers around individual values, including embedded quotes (or you can use some other character as the qualifier). Will use column names from the first line of the file, or you can supply the column names yourself. You can specify what line number to start at, and how many lines to read at a time. You can specify how "bad rows" should be handled.

Note: This tag was compiled using Delphi 4 and Vadim's DelphiCFX package. If you enjoy the tag's performance, consider trying these tools out yourself, or buy me a little gift of some kind, or both! :) -nate

16-Oct-98
Fixed bug in BADROWSARE="LONG". Added COLCOUNTS parameter. Added DEBUG output. Also, tag should no longer keep a lock on a file if an error is encountered during processing.

Parameters (all are optional except FILE)

FILE

The full filesystem-style path to the text file.

NAME

The name of the query result set to put the file's contents into. So if NAME="MyQuery", then you could use QUERY= "MyQuery" in a CFOUTPUT tag to display the records.

If you don't provide a NAME, no query will be created; however, the four variables listed below will still be set.  This allowsyou to "pre-scan" a file before processing it to see if it contains bad rows.

COLUMNS

If provided, a comma-separated list of column names. These will be the columns in the query result set named by the NAME parameter. If not provided, the column names will be read from the first line of the file.

The column names will be forced to be legal ColdFusion identifiers, by stripping illegal characters such as spaces, hyphens, etc., from the names. Column names that are made up of only illegal characters will be named with "COL" then the column number. If any column names start with numbers, "COL" will be prepended to the name to make it legal. If any column names are duplicated, underscores will be appended to the names to make them unique. So a file that has a first line like:

State,Sales,Sales This Month,1998,Q2,1999,Q2,$$$

would be treated as if it had been:

State,Sales,SalesThisMonth,COL1998,Q2,COL1999,Q2_,COL8
Of course, you can refer to the #MyQuery.ColumnList# variable to find out what the final list of column names was, just as you can with any query result set.

DELIMITER

The character to be used as the delimiter. Defaults to a comma.

TEXTQUALIFIER

The character to be used as the text qualifier. Defaults to a double-quote mark.

LONGROWSARE

GOOD or BAD. Defaults to GOOD.

As each line is reach from the file, the number of columns (fields) in the line is counted.

  • If the number of fields is the same as the number of columns specified in COLUMNS (or the first line of the file), the line is considered "good", and it is added to the query specified in NAME.
  • If the number of fields is less than the number of columns (that is, the line doesn't have enough data in it), the line is considered "bad", and the tag reacts according to how you set the ONBADROW parameter, below.
  • If the number of fields is more than the number of columns (that is, the line has too much data in it), the line is still considered "good" unless you specify LOWROWSARE= "BAD".

ONBADROW

ERROR, STOP, SKIP, or USEANYWAY. Defaults to ERROR.

Determines what the tag does if a "bad" row is found (see above for definition of a bad row). Note that no matter what this parameter is set to, the line number of the row in the text file is added to the list of bad row numbers in the #CSV_BadRows# variable that gets set by the tag automatically.

ERROR: All template execution stops and an error message is displayed. If you have previously defined a custom error template with the CFERROR tag, that will be used.

  • STOP: The tag stops reading the file, but no ColdFusion-style error is thrown. Template execution continues normally. The bad line does not get added to the query.
  • SKIP: The tag ignores the bad line and moves on to the next one. This will allow an entire file to be read even if some of the data is not delimited / qualified properly.
  • USEANYWAY: The tag does its best to use the data anyway. It puts the first field into the first column and so on until it runs out of data or runs out of columns (whichever comes first). So lines that don't have enough data will generate empty cells in the query. Lines that have too much data will cause the extra data to be "lost".

After the tag executes, the following variables will be populated for you:

 

Advanced Parameters
The tag also supports several additional parameters that can be used to read in a certain number of rows at a time, or to start at a specific row in the file. If you just want to read the whole file in at once, you don't need to read about these parameters.

STARTROW

Defaults to 1.

The line number to start at. So if STARTROW= "10",the data in the 10th line of the file will be in the first row of the returned query. If the column names are being read in from the first line of the file, it doesn't count here (so the data in the 11th line of the file will be in the first row).

MAXROWS

If provided, provides the maximum number of lines that should be read in from the file. Useful if you have a very large file and don't want to read in the whole thing at once because it would take up too much of your system's resources. See example below.

ROWNUMBERS

If provided, an additional column will be added to each row of the query, which indicates the original row number that the data occupied in the file. So if ROWNUMBERS="LINENUM", there will be an extra column in the returned query, named "LINENUM".

Also helpful if there are bad lines in the file and ONBADROW= "SKIP".The skipped row numbers will not be present in this column. Thus, #ValueList(MyQuery.LineNum)# would contain a comma-separated list of the row numbers that were "good".

COLCOUNTS

If provided, an additional column will be added to each row of the query, which indicates the number of "columns" (fields) that were found in the corresponding line of the text file.  Useful if you have instructed the tag to imports all lines from the file, even if they contain too much data (by setting  ONBADROW to USEANYWAY and LONGROWSARE to GOOD).  So if you specified COLCOUNTS= "NumFields", you would be able to look at #MyQuery.NumFields# to determine if any excess data hadbeen truncated from what was contained in that line of the text file.

DEBUG

If you add a DEBUG parameter to the tag, you will see output in your browser that shows each line that the tag is working on and the number of fields (columns) in each line.  Usefulif you think that the tag is getting "caught" on a particular line.

The following variables are also set for you when a MAXROWS parameter has been provided (actually, they are set for you even if it's not provided, but they aren't really relevant unless MAXROWS is being used). Together with the above parameters, they can be used to get a "chunking" effect, as shown in the example below.

Example
The following example demonstrates how to use STARTROW/MAXROWS and related variables to create a "data chunking" loop. The file is read in 5-line chunks. Each chunk of lines can be processed or output in whatever way you want, and the loop will execute as many times as needed until the end of the file is reached. Note that while this tag is compatible with ColdFusion 2.0, this template wouldn't work because it would result in multiple queries with the same name, which was not allowed in 2.0.

<CFSET CSV_StartRow = 1>
<CFSET CSV_ReachedEOF = "No">

<CFLOOP CONDITION="NOT CSV_ReachedEOF">

  <CFX_CSVToQuery
    FILE="c:\whatever\Comma.txt"
    NAME="MyQuery"
    COLUMNS="FirstName,LastName"
    ROWNUMBERS="LineNum"
    MAXROWS="5"
    STARTROW="#CSV_StartRow#"
    LONGROWSARE="BAD"
    ONBADROW="SKIP">
    
  <CFOUTPUT>
    <P>Rows Read: #CSV_RowsProcessed#<BR>
    Bad Rows:     #CSV_BadRows#<BR>
    ReachedEOF:   #CSV_ReachedEOF#<BR>
    StartRow:     #CSV_StartRow#<BR>
  </CFOUTPUT>
    
  <CFOUTPUT QUERY="MyQuery">
    <LI> #LineNum#: #LastName#,#FirstName#
  </CFOUTPUT>
  
</CFLOOP>

This would cause something like the following to be displayed:

Rows Read: 5
Bad Rows:
ReachedEOF: NO
StartRow: 6

  • 1: Weiss,Nate
  • 2: Freeman,Jed
  • 3: Stover,Jason
  • 4: Hawkins,Stephanie
  • 5: Dinowitz,Mike

    Rows Read: 5
    Bad Rows: 9,11
    ReachedEOF: NO
    StartRow: 13

  • 6: Higgins,Ned
  • 7: Keating,Erica
  • 8: Pozo,Ana
  • 10: Winslet,Kate
  • 12: Brooks,Albert

    Rows Read: 5
    Bad Rows:
    ReachedEOF: NO
    StartRow: 18

  • 13: Lemmon,Jack
  • 14: Randal,Tony
  • 15: Gates,Bill
  • 16: Allaire,Jeremy
  • 17: Simeonov,Sim

    Rows Read: 2
    Bad Rows:
    ReachedEOF: YES
    StartRow: 20

  • 18: Webber,Paine
  • 19: Ade,Gator

    The file used to generate the above sample output was:

    Nate,Weiss
    Jed,Freeman
    Jason,Stover
    Stephanie,Hawkins
    Mike,Dinowitz
    Ned,Higgins
    Erica,Keating
    Ana,Pozo
    badboy
    Kate,Winslet
    Mira,Sorvino,Babe
    Albert,Brooks
    Jack,Lemmon
    Tony,Randal
    Bill,Gates
    Jeremy,Allaire
    Sim,Simeonov
    Paine,Webber
    Gator,Ade