CF8 PDF Manipulation: Pulling Text Out

So, this morning a friend called me up with a problem. They had received some PDF files from their insurance company, and they needed the data in Word or Excel for manipulation. Now, they could cut and paste the information, but this was time consuming. She went to the Adobe site, trying to find info, and saw 'ColdFusion' on the homepage. This sparked her brain, because she immediately went, "Hey, Cutter does something with ColdFusion! Maybe he can help me!"

Lucky for her, we now have ColdFusion 8, with it's built-in PDF support through the use of the CFPDF tag. I had to do a tiny bit of research on this, because Adobe's CF LiveDocs weren't overly clear, but I eventually found out that I could extract text with some very simple DDX processing directives.

Ray did a series of posts recently about working with PDF documents. Although none of them answered my question directly, he had written one about using the DDX processing directives. This sent me searching the Adobe site for more information, which is where I came upon the Understanding DDX developer documentation. Basically, by rewriting Ray's simple example, I was able to extract all of the DocumentText from the PDF and dump it into an XML file. First I need the DDX, which is just some simple XML:

view plain print about
1<cfsavecontent variable="myddx">
2<?xml version="1.0" encoding="UTF-8"?>
3<DDX xmlns="http://ns.adobe.com/DDX/1.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">
4    <DocumentText result="OutXML">
5        <PDF source="Title"/>
6    </DocumentText>
7</DDX>
8</cfsavecontent>
9<cfset myddx = trim(myddx)>

Then, I verify the validity:

view plain print about
1<cfif isDDX(myddx)>
2yes, its ddx
3<cfelse>
4no its not
5</cfif>

Now, a little explanation. Looking at the DDX, you'll notice I've defined a result and a source. I had tried to define my file names here directly, but ColdFusion didn't like that when I hit the CFPDF tag. Apparently, when using the processddx action of the tag, you are required to define your inputfiles and outputfiles. Further study of the LiveDocs shows that ColdFusion is expecting structures for these defininitions. So, the DDX references certain structure keys (OutXML and Title) which you must define prior to processing your pdf.

view plain print about
1<cfset inputStruct = StructNew() />
2<cfset inputStruct.Title = "rptLauncher2.pdf" />
3
4<cfset outputStruct = StructNew() />
5<cfset outputStruct.OutXML = "words2.xml" />

You now have all of the necessary pieces. All that's required is your call to process your DDX directives.

view plain print about
1<cfpdf action="processddx" ddxfile="#myddx#" name="VARIABLES.doc" inputfiles="#inputStruct#" outputfiles="#outputStruct#" />

I CFDump the VARIABLES.doc to see my success or failure, which comes out just fine. I now have a file, words2.xml, sitting in my server's folder, which contains all of the content of the PDF file. Simple and sweet.

CF8 Ajax Grid: Renderers and Events

So, I was doing a real quick, down and dirty form and results app for something internal. Way temporary, with little scale-out, I wrote a form and processor, then used the CF8 DataGrid for the results display. Problem was, two of the fields were textareas that could contain a lot of info, so I needed a quick way to show and expanded details set. Now, had I been using ExpanderRow plugin, but this was just quick implementation prototyping type stuff.

What I needed was a column of icons that I could then link to a CFWindow with the total display. Now, I have to use a Cell Renderer to place the image in the empty column, but first I need the column.

view plain print about
1<cfgridcolumn name="Details" header="" width="25" display="true" />

After that, I create a basic Cell Renderer:

view plain print about
1setDetailButtonRenderer = function(grid,cm,col){
2        cm.setRenderer(col,function(value,p,r,ind){
3            var retVal = "<img src='/resources/images/icons/book_link.gif' width='16' height='16' alt='Details' />";
4            return retVal;}
5        });
6        grid.reconfigure(grid.getDataSource(),cm);
7    }

This didn't entirely work out, as it placed the image in every row, even if there wasn't a record. So, time to improvise. I adjust to see if there's value for a cell in this row's 'record', to determine whether I need the image.

view plain print about
1setDetailButtonRenderer = function(grid,cm,col){
2        cm.setRenderer(col,function(value,p,r,ind){
3            var ds = grid.getDataSource();
4            var theRecord = ds.getAt(ind);
5            if(theRecord.get('TS') != null){
6                var retVal = "<img src='/resources/images/icons/book_link.gif' width='16' height='16' alt='Details' />";
7                return retVal;
8            }
9        });
10        grid.reconfigure(grid.getDataSource(),cm);
11    }
12
13    function showRecWin(){
14     ColdFusion.Window.show('winDetails');
15 }

Alright, to call the renderer into play I have an init method that is fired by the CF ajaxOnLoad() method.

view plain print about
1init = function(){
2        var repGrid = ColdFusion.Grid.getGridObject('reportsGrid');
3        var repCM = repGrid.getColumnModel();
4
5        setDetailButtonRenderer(repGrid,repCM,8);
6    }

Now we're halfway there. Next I need to get a 'click' on the image cell. You do this by accessing the underlying Ext functions of the Grid object itself, for which you already have a reference (repGrid).

view plain print about
1init = function(){
2        var repGrid = ColdFusion.Grid.getGridObject('reportsGrid');
3        var repCM = repGrid.getColumnModel();
4
5        setDetailButtonRenderer(repGrid,repCM,8);
6
7        repGrid.on('cellclick',function(grid,rowIndex,columnIndex,e){
8            if(columnIndex==8){
9                
10            }
11        });
12    }

We are configuring an on cellclick function here, which is really a listener on the row itself. We further narrow it to only perform action if the column that the cursor was in 'on click' was our Details column, which is the 9th column of our grid, including hidden columns (remember that this uses a JavaScript array, which starts with zero, so the column you reference is always column count minus one).

Next thing we need is a quick modal pop-up for our 'Details.' CFWindow makes a great candidate for this.

view plain print about
1<cfwindow name="winDetails" title="Details" draggable="false" resizable="false" initShow="false" height="600" width="600" />

It's invisible when initialized, because we only want to show it 'on click'. We need a quick method for 'showing' the window.

view plain print about
1function showRecWin(){
2     ColdFusion.Window.show('winDetails');
3 }

We can now reference this in our 'on click' function.

view plain print about
1repGrid.on('cellclick',function(grid,rowIndex,columnIndex,e){
2        if(columnIndex==8){
3            showRecWin();
4        }
5    });

OK, we get our window, but now we need some data. Now, I could do an ajax call for the data, but it's already in my cell. It's just too long to easily display in the grid. Rather than do another server call, I'll just query the grid's Data.Store for the information.

view plain print about
1repGrid.on('cellclick',function(grid,rowIndex,columnIndex,e){
2        if(columnIndex==8){
3            showRecWin();
4            // This empties out any previously displayed content
5            document.getElementById("winDetails_body").innerHTML = "";
6            var ds = grid.getDataSource();
7            var theRecord = ds.getAt(rowIndex);
8            var valPurpose = theRecord.get('FEATUREPURPOSE');
9            var valFunction = theRecord.get('FEATUREFUNCTION');
10            document.getElementById("winDetails_body").innerHTML = "<b>Purpose:</b><br />" + valPurpose + "<br /><br /><b>Function:</b><br />" + valFunction;
11        }
12    });

Really simple, as long as you remember that ColdFusion's creation of the grid's ColumnModel will uppercase all of your cfgridcolumn's name attributes.

That's it. Really doesn't take a whole lot. A little digging in the documentation for the 1.1.1 version of the ExtJS library will give you a ton of information.

Enter The Holidays

Happy Thanksgiving to everyone, and welcome to the holiday season. My mother is in town now, and my family and I just completed our move to our new apartment. Great place, terrific neighborhood, excellent schools...really loving it, now that we're getting settled. Once we put the pictures on the walls and put the moving tubs into storage we'll be golden. And I am loving the telecom company here, with fiber to every unit we now have great digital cable and a 15MB internet connection. Man, it is fast!

Next week I'm going to try and re-record my UG preso on CF8's Ajax Components And Beyond. The original preso went very well, but we had some glitches in the recording process. I hope to get that done and on UGTV by the end of next week. I am working on extending some RIAForge components for working with the Google Maps API. Hoping to make some of that publicly available in the very near future.

And, since we're entering the season of giving, I encourage everyone in the ColdFusion community to contribute to some open source project. CFCommerce is trying to get into the groove, there are hundreds of items on RIAForge, and any of the frameworks would probably appreciate a helping hand.

Happy Holidays to all!

ColdFusion OO Architecture: Get Out Of The Box

For the past two days there has been a very interesting thread being discussed on the Model-Glue mailing list. You have to read through the first few messages in the thread before you really start getting into the meat of the discussion, with some great comments from Sean Corfield and Peter Bell.

It goes back to the ongoing Design Patterns Debate, making it's way around the ColdFusion community. Many have adopted the Table Row Pattern, used by popular ORMs, almost as a standard for development. But is this the right way to go with ColdFusion? Are we writing too much code to accomplish simple tasks?

Sean takes some responsibility for this thought process, believing that some of this has stemmed from his Mach-II Development Guidelines doc while at Adobe. While that may be somewhat true, I think that it probably stems more from the fact that OO is still fairly new to the ColdFusion world. While we've been capable of writing OO code since the introduction of CFCs in 6.1, adoption of the concept has been slow, and only truly picked up major steam over the last two years or more.

In the thread, Sean knocks on the large adoption of the 5:1 business object concept. He doesn't state that it's completely wrong, only that it may be overkill in most cases, and should not be the end-all-be-all. While the Bean-DAO-Gateway paradigm may be great for simple CRUD type operations, and simple table fillers, it's not well suited to complex objects. A Factory approach may be a better option. The primary point is, there is No One Way, and that we shouldn't pigeon hole ourselves into design patterns that are primarily designed for Java, when ColdFusion (being typeless) has more in common with languages like Ruby, Python or Groovy.

There is no One Way, or even Wrong Way, and maybe it's time for all of us to begin thinking outside the box again. We can build great, rapid, OO applications, if we just start doing it.

I've paraphrased some stuff here, so if I've gotten someone's comments wrong, or completely mixed up, I apologize now and please feel free to correct me.

Watch What You Write, Someone Is Reading

Today I received the following comment here, on an older post on Variables and Naming Conventions:

...I wish Adobe would publish and adopt some kind of official naming convention. Sometimes reading sample code written in some other convention can make things harder to follow...
It was almost funny that this comment had come in when it had. Recently I was doing a lot of research for a User Group presentation I just did on the new ColdFusion 8 Ajax Components (have to re-record it before public release). In the process, I spent a great deal of time going over documentation all over the internet, from LiveDocs to countless blogs, absorbing the wealth of information that is already out there. It was outstanding that there were so many resources out there for people to learn from. On the other hand, it was a little sad that so much of the sample code was written in ways that can really start new developers off with some bad habits.

I'm not perfect, by any means, but I try to pay careful attention to the code that I place on this blog for readers to use and learn from. One thing that I attempt to do is pay attention to basic Web Standards, like using XHTML (the current standard) instead of HTML, keeping styles in the stylesheet, and having unobtrusive JavaScript. I don't always do it, sometimes it doesn't make sense for a quick example, but I try, especially within code downloads. I also try to adhere to my own Coding Guidelines, so that code appears to be consistent and easy to read and understand.

Probably the one that bothers me the most, and that I see most prevalent in blogs, documentation, and books, is the lack of proper variable scoping. I know that, often, we're just publishing quick examples, but this can be an extremely detrimental practice. I have worked on some very large enterprise applications, with years of code written by half-a-dozen different developers, most of whom learned their ColdFusion (and development) skills through the docs or a book. Many had actually come up with some very creative and effective algorithms to fix some issue, or create some new whiz bang feature, but their code was so poorly scoped that, after time, it could take down the server. Why? How? Enterprise sites may contain several hundred (or thousand) templates, containing dozens of variables on each page, and can potentially be hit by hundreds (or thousands) of users simultaneously. Multiply the number of variables by the number of pages by the number of users, then imagine ColdFusion doing a ScopeCheck on each one, to figure out which scope each variable requested belongs in. Even if the variable is in the VARIABLES scope, it's still that many times ScopeCheck will be called while rendering a page.

Still not convinced? Go download varScoper, and run it on your project root folder, including your subfolders, and see what it comes up with. Yeah, I'm still in shock. Cleanup on that is easier on a small subproject scale, but it's definitely forced me to think better when I'm writing my code, paying attention as I go, to minimize the performance impact of my applications, no matter how small it may be. I learned my bad habits from the docs, various books, sample code slung around on the CF-Talk list. I've continued to realize that there are better ways of doing things (like OOP and frameworks), and adjust my style and methods, and I think it's important to consider these 'best practices' when contributing. A little more code, but the right thing to do in the end, for you, your app, and your systems.

So, if you own a site of documentation, revise it. If you're writing a book, edit it. If you publish a CF blog, live it. The up-and-coming are reading us all of the time to find out how to use this wonderful language. Let's try to show 'em how to do it the right way. You might not follow any guidelines at all, within your development, but this scoping thing is way too important to gloss over, and will only help everyone in the long run.

Yes, I Am King!

Well, this was fun! Thanks Aaron.

NerdTests.com says I'm an Uber Cool Nerd King.  What are you?  Click here!

CFGrid Gotcha

So, I'm finally playing with some of the new Ajax controls built into ColdFusion 8. They're based on ExtJS (for the most part), and I thought it would be cool to dig in and see what I could do.

So, pulled up the documentation. First I built a basic CFC, with a remote access method that pulls all of the records from the Art table of the cfartgallery db. Then I built the display page, with the cfgrid and cfgridcolumn tags. I used the bind attribute to bind the grid to the cfc method. Tried it out and...error.

view plain print about
1CFGRID: Response is empty [Enable debugging by adding 'cfdebug' to your URL parameters to see more information.]

OK. Fun. No response messages showing in Firebug, but the right parameters were getting passed through. Google is your friend, right? One reference that I could find, in the comments on a post at Ben's site, but it only pointed me towards the Application.cfc, with no explanation on what the problem was or how to fix it.

So, I changed the file name of my Application.cfc. I didn't need it this early in the game, so I took it out of play. Voila! It works. OK, so what's in the Application.cfc?

Well, I had already commented out the onError method (figuring out an issue with Coldspring). There wasn't any output in any of the methods. I went over all of my attributes and mappings...nothing. Then I noticed something.

I took the Application.cfc template from Ray's site, with very minor adjustments. I finally noticed that one function didn't have an 'output' attribute, onRequest.

view plain print about
1<cffunction name="onRequest" returnType="void">
2        <cfargument name="thePage" type="string" required="true" />
3        <cfinclude template="#arguments.thePage#" />
4    </cffunction>

Once I commented this function out the call worked perfectly. Well, lessons learned...

ColdFusion 8 Fun: Looping Files

OK, so I've been working on my mother's website for...well, too long. One of the reasons is I've been waiting on her to get approval to get a feed of listings, so we can put them directly on her site. Well, she finally got the approval, so I've been having fun this weekend, pulling in data and images, setting up database tables. The Works.

These feeds are tab delimited text files. The first line being a listing of all of the columns, with all of the rest being the data. So, I set up a staging table, with column names that match those in the file (luckily they provide a listing of the columns, along with their data type and length, in a separate .log file). Next, I used the Illudium PU-36 Code Generator to quickly give me some data access objects, and then settled down to write a little code.

Now, my first file has 7,000+ records in it, so I go ahead and give myself a little time for the code to do it's job.

view plain print about
1<cfsetting enablecfoutputonly="true" requesttimeout="600" />

Next thing I wanted were a few variables and objects to work with.

view plain print about
1<cfset VARIABLES.lineNum = 1 />
2<cfset VARIABLES.filePath = expandPath(".") & "\myFile.txt" />
3<cfset VARIABLES.Bean = CreateObject("component","feedRecord") />
4<cfset VARIABLES.DAO = CreateObject("component","feedRecordDAO").init(APPLICATION.dsn) />

And then I setup the loop on the file.

view plain print about
1<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
2    <!--- Code to go here --->
3</cfloop>

OK, for those who don't know, the DAO object that is created by the code generator takes a bean object as the argument for the save() method. The bean object has an init() method with all of the column names as non-required arguments. So, how to best initialize my bean? Well, the data file's first row is a tab delimited list of the column names, so I decide to use it. First, I only want the first row to give me a data structure of the column names, in the order I'll need them. Hmmm? Ok, I decide to use an Array.

view plain print about
1<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
2    <cfif VARIABLES.lineNum gt 1>
3        <!--- This is for later --->
4    <cfelse>
5        <cfset VARIABLES.propOrder = ArrayNew(1) />
6        <cfset VARIABLES.lineCount = 1 />
7        <cfloop list="#VARIABLES.line#" index="VARIABLES.listItem" delimiters="#Chr(9)#">
8            <cfset VARIABLES.propOrder[VARIABLES.lineCount] = VARIABLES.listItem />
9            <cfset VARIABLES.lineCount++ />
10        </cfloop>
11        <cfset VARIABLES.lineCount = 0 />
12    </cfif>
13    <cfset VARIABLES.lineNum++ />
14</cfloop>

Notice that the first part of my flow control is currently blank. This area I left at the beginning, as most lines will meet this criteria, and that's where the meat of the processing will be handled in the end. This Array, though very important, is only handled on the first row of the file. It will process first, because of the way the flow control is written, but bypassed throughout the rest of the process. BTW, I love the JS style operators;)

Now, I used an Array to maintain the order of the key names, but ultimately I'll need a Struct to pass into the bean's init() method, as an argumentCollection.

view plain print about
1<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
2    <cfif VARIABLES.lineNum gt 1>
3        <cfset VARIABLES.resProp = StructNew() />
4    ....

Now, I was going to list loop through each line to set my Struct, but found out the hard way that <cfloop> still doesn't like empty items in a string. I was getting errors all over the place about truncated data and what, before I noticed data wasn't in the right place. What to do? Take a different approach! Instead of looping a list, I'll loop an Array, and make my Array from the list, while using the new includeEmptyFields option.

view plain print about
1<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
2    <cfif VARIABLES.lineNum gt 1>
3        <cfset VARIABLES.resProp = StructNew() />
4        <cfset VARIABLES.arrProps = ListToArray(VARIABLES.line,Chr(9),true) />
5        <cfloop from="1" to="#ArrayLen(VARIABLES.propOrder)#" index="VARIABLES.itemCount">
6            <cfset VARIABLES.resProp[VARIABLES.propOrder[VARIABLES.itemCount]] = VARIABLES.arrProps[VARIABLES.itemCount] />
7        </cfloop>
8</code>
9
10Did you see it? Simple, eh? Now I have a Struct, where the data from each line matches up with the keys set from the first line of the file. All that's left is to set my bean and pass it to the save() method of the DAO.
11
12<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
13    <cfif VARIABLES.lineNum gt 1>
14        <cfset VARIABLES.resProp = StructNew() />
15        <cfset VARIABLES.arrProps = ListToArray(VARIABLES.line,Chr(9),true) />
16        <cfloop from="1" to="#ArrayLen(VARIABLES.propOrder)#" index="VARIABLES.itemCount">
17            <cfset VARIABLES.resProp[VARIABLES.propOrder[VARIABLES.itemCount]] = VARIABLES.arrProps[VARIABLES.itemCount] />
18        </cfloop>
19        <cfoutput>Saving Record ## #VARIABLES.lineNum#. </cfoutput>
20        <cfset VARIABLES.Bean.init(argumentCollection:VARIABLES.resProp) />
21        <cftry>
22            <cfif VARIABLES.DAO.save(VARIABLES.Bean)>
23                <cfoutput>Record saved.<br /></cfoutput>
24            <cfelse>
25                <cfoutput>Error saving record.<br /></cfoutput>
26                <!--- custom cfthrow here --->
27            </cfif>
28            <cfcatch type="any">
29                <!--- and a custom error handler here --->
30            </cfcatch>
31        </cftry>
32        <cfflush />
33    <cfelse>
34        <cfset VARIABLES.propOrder = ArrayNew(1) />
35        <cfset VARIABLES.lineCount = 1 />
36        <cfloop list="#VARIABLES.line#" index="VARIABLES.listItem" delimiters="#Chr(9)#">
37            <cfset VARIABLES.propOrder[VARIABLES.lineCount] = VARIABLES.listItem />
38            <cfset VARIABLES.lineCount++ />
39        </cfloop>
40        <cfset VARIABLES.lineCount = 0 />
41    </cfif>
42    <cfset VARIABLES.lineNum++ />
43</cfloop>
44<cfsetting enablecfoutputonly="false" />

That's it! Nothing to it! Now, there are probably better ways, and half of this should be encapsulated even further, and it will break if the feed provider changes the column names. But, hey, it was fun! Right?

Example code included below with the Download link.

Up To The Latest And Greatest

OK, last night I upgraded to BlogCFC 5.9, which Ray released the other day. Pretty heavy update, with over 30 file changes in the readme, but very smooth when using WinMerge (thanks to Mark Drew for that little tip).

One item of interest, though. Ray mentions in the readme that he's stopped logging the changes in the document headers, stating that it's redundant to place them there and within the readme doc. He also states "I decided I'd skip that since BlogCFC 6 will have new files." I also noticed, while reviewing the changes, that he has code in place for checking the server version. CF 8 specific changes on the way? We'll have to wait and see.

Ext 2.0 Alpha Public Release

OK, many of you know that I've become a fan of the Ext JavaScript library. I really started looking at it heavily just before Ben Forta told us about the new Ajax widgets in ColdFusion 8. Good thing for me, since most of those widgets are built using the Ext library.

Well, the Ext team has done it again. Or, rather, they've outdone themselves. Rey Bengo popped me an IM this afternoon, with the link to the Ext blog posting about the public release of Ext 2.0 Alpha. I've already had the opportunity to preview some of this outstanding new stuff, and it is way above the bar. The Ext crew has been hard at work, mulling over thousands of forum entries, emails, etc, enhancing and refining this excellent toolset. The samples page showcases some of the new features and enhancements, and is now separated from the API Browser, which is a thing of beauty in it's own right.

There are some heavy changes under the hood, with migration documentation soon to be released to ease the transitions. I wouldn't expect to see the new stuff built into ColdFusion anytime soon, but it would be tremendous if Adobe made that happen in an upcoming update (NOTE: Adobe has said no such thing, I just wish they would;). And anyone interested in client side JS code should read through the 'source' files that are included in the download. Simply amazing.

It's very expandable too. I've already added to the RowExpander Plugin, for the DataGrid, to take a function reference argument in it's config, so that I could populate the 'expanded' area with Ajax fed content, basically in the same vain of setting a custom cell renderer but only 'onclick'.

So, looks like I'll have to write some new DataGrid articles, which I've also begun to crosspost onto the Ext site in their Learning Center. I'm already envisioning some of the killer apps that will be fronted by this (can you say 'AIR'?) Big kudos to Jack and the guys for a stellar v2.0.

Previous Entries / More Entries