Archive for the ‘patterns’ Category
Input code “TECHBARG30IT” in step 3 of the checkout process, “Payment Method”. Free Ground Shipping. Tax in most states.
About this time last year I sequestered myself in a rental house on Cape Cod while my wife and parents took my daughters on an endless loop of beach/mini-golf outings. My time was spent finishing up the primary text for Mashup Patterns. For those of you thinking, “But the book only came out a few months ago, how is that possible?” let me just say, “Yes, the publishing process is that complicated”. You can imagine how disruptive eBooks are – but that’s a story for a different time and place.
This year I was back on Cape Cod again, but able to relax a bit myself. It seemed like a good time to reflect back a bit what did (and didn’t) make the final cut. One of the biggest things I left out of Mashup Patterns was a discussion of Screen Scraping. Actually, that’s not entirely accurate. I talk about it a lot in Chapter 1 (“Acquiring Data from the Web”) then again in Chapter 4 (“Data Extraction”) and a little more still in Chapter 4 (“Harvest Patterns”). So how can I say I left it out?
What I talk about specifically is ”scraping” data from Web pages, which I prefer to call “DOM Parsing” since what you’re really doing in traversing the page’s underlying Document Object Model and looking at things like CSS attributes, id’s, names, etc rather than an item’s absolute screen position. “DOM Parsing” is the underlying technique used by products like Kapow, Dapper, Convertigo, and Mozenda.
Why do I bother making this distinction? The reality is that Screen Scraping has a negative connotation in many circles. The old techniques for acquiring data based on X/Y screen position are not well regarded. And justly so; many of us built solutions on top of screen scraping products only to see them fail miserably when a single label was renamed. I wanted readers to know that Web Harvesting was a much more robust and fault-tolerant approach for the twenty-first century.
Have I had any impact? I’m not sure. I’m happy to see sites like Mozenda advertise that they perform Web Harvesting, but of course right above that they claim they do Screen Scraping as well. Argh! Plus, I continue to be on panels and calls and hear the two terms used interchangeably. When a technology is reasonably similar to something a developer has seen before, we have the tendency to use old labels, judge it by previous experiences, etc. This inclination can keep us from recognizing the truly innovative stuff, so it’s a trap we have to watch out for. But I digress; the point I actually want to make is that there is a time and place where Screen Scraping can add value.
I don’t know how many mainframe-based systems with no web front-end are out there, but it’s a stockpile that’s accumulated over decades. A large number, being perfectly suited for their purpose, will continue to linger along unchanged. Perhaps it’s just too expensive to re-write or re-platform them. How do we then leverage these resources in our new creations? For example, how do we take an ancient order-fulfillment system and link it with a snazzy new Salesforce application?
You see where I’m headed. This is a mashup, too. Only instead of the normal cadre of web applications, RSS feeds, SOAP APIs, etc, we want to include mainframe content. And if there are no other avenues in via the database or other feed, then Screen Scraping is a perfectly viable option. In 2004, before the term Mashup was in wide use, David Linthicum’s book,
Next Generation Application Integration: From Simple Information to… talked about this exact approach.
“Leveraging the user interface as a point of information integration is a process known as “screen scraping,” or accessing screen information through a programmatic mechanism. Middleware drives a user interface (e.g., 3270 user interface) in order to access information. Simply put, many application integration projects will have no other choice but to leverage user interfaces to access application data and processes. Sometimes access to underlying databases and application interfaces does not exist.”
David also summarized the all-too-common downsides:
“A user interface was never designed to serve up data, but it is now being used for precisely that purpose. It should go without saying that the data-gathering performance of user screens leaves a lot to be desired. In addition, this type of solution can’t scale, so it is unable to handle more than a few screen interfaces at any given time. Finally, if the application integration architect and developer do not set up these mechanisms carefully, they may prove unstable.”
Nevertheless, sometimes working “at the glass” is our only option. Dozens of companies offer solutions in this space, but only two that I’m aware of (Convertigo and Lansa) have actually connected the dots between interacting with a mainframe and building enterprise mashups. When David wrote his book, it was for an IT audience focused on integrating disparate applications. Today, we realize that besides this lofty goal, sometimes it’s just as useful to mine small nuggets of useful functionality from a system to build something unique and new.
I started this post out with a little reflection on where I was about a year ago. Personally, I think leaving the mainframe discussion out of Mashup Patterns was the right call. But professionally, none of us can afford to ignore the past. Legacy resources are everywhere, and they can easily be incorporated into today’s new mashups. Regardless of the sources underpinning your mashup you need to be aware of their fragility, provide notification and controls for dealing with any unexpected downtime, and incorporate multiple, redundant sources for data when possible. Don’t let a bias against “scraping” keep you from using what may be some of the most valuable functionality in your firm.
What are enterprise mashups? JackBe is running a contest to come up with a definition everyone can agree on. I think the biggest problem is that people try and define the tools, the goals, and the constituent parts in the same sentence. It leads to cumbersome, wordy, and sometimes narrow definitions like this one:
“Enterprise mashups are integrated business applications that combine data from two or more data sources, including enterprise data sources such as web services, and possibly external web services. The nature of mashups is to provide a quick return on investment using high-productivity tools and techniques. These include AJAX, browser-based tools, and reusing existing web services and web components.” (source http://applibase.com/DataCaster_FAQ.html)
Would a layperson (and potential mashup creator) understand that? It also perpetuates the common misconception that mashups must have more than one data source. Wrong! People assume that prerequisite because “mashups” seems to demand this plurality, but in fact there are advantages to using mashup tools with only a single web site (such as creating an RSS feed or API where non previously existed)
How about this one:
“Mashups are a brute force joining of disparate Web Data, oblivious to the underlying Data Model(s), and often based on RSS 2.0 (a Tree Structure that contains untyped or meaning-challenged links.)” (source: http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid) )
A “Brute force” approach sounds neither flexible nor easy, which is ideally the exact opposite of what enterprise mashups aim to be. Let’s look at one more:
“A mashup is a dynamic web application that brings together data stored in many different applications for better decision making” (source: Luis Derechin, JackBe CEO on FOX Business)
We’re getting closer because this is the first definition that answers the “why” about mashups. Why should I care about these things? Luis undoubtedly knows mashups can accomplish a host of things including streamlining a cumbersome process, pushing content to alternative devices, etc. My guess is he honed in on “better decision making” because he was thinking of what would be of interest to the typical FOX Business viewer. But now we’re on the right track.
First, we should recognize that the distinction between a conversational definition – which can be used to spark further discussion and a definition for reference sources like Wikipedia. I’m going to focus on the former case, which is still tricky territory to navigate. It’s like trying to explain what an Operating System does outside of technical circles. Is it important to explain issues like threading, memory management or job scheduling? No; I believe users want to understand what an O/S lets you accomplish and not how it does it. We should hold a definition of mashups to this same principle.
To paraphrase John Crupi’s (JackBe’s CTO) remarks at CeBIT, the definition of enterprise mashups is of little use if it doesn’t communicate their value to the business. To that end, here’s how I explain enterprise mashups when asked:
Enterprise mashups unleash the information locked in a company’s systems and the creativity trapped within its employees to allow anyone to quickly meet specific business challenges.
As I said – this is just a jumping off point. I normally follow-up this statement with a series of questions: “Have you ever used an application that would be perfect if it had just one small change?”, “Have you ever had to wait forever to get the tools you needed to do your job?”, and “Do you constantly waste time cutting and pasting from different applications to get your work done?”
I don’t want a short definition (no matter how “technically correct”) that is obtuse and intimidates users. The key is to pique the listener’s curiosity and draw them in. Once they share their problems I can help them understand how enterprise mashups can help.
I have an introductory article on how to mine the Deep Web (and Deep Intranet!) with mashups over at InformIT. The article is a good introduction for the layperson who may be unaware of how mashups can be used in this manner.
I have to point out one small mistake; the article mentions that the Presto platform (from JackBe) is able to implement the API Enabler pattern through their partnership with Dapper. In fact, their in-house developed EMML (Enterprise Mashup markup language) also makes this possible (w/o needing any Daps).
Twinsoft’s Convertigo platform (not mentioned in the article) is also capable of implementing API Enabler. I’m sure there are other tools I am missing that can create a SOAP or REST api against a presentation layer (either by screen-scraping or the more elegant technique of parsing a page’s DOM). New tools seem to be popping up almost daily! The take-away is that mashups can do more than combine disparate sites together. They can also extract data “at the glass” when a public interface isn’t available (or doesn’t expose the specific information you’re after)
This week I had the pleasure of visting JackBe headquarters in Chevy Chase, MD and conducting a joint webex with CTO John Crupi. For those who might not know, John is a well-established figure in the Patterns community having co-authored Sun’s Core J2EE Patterns book. John had the foresight to register MashupPatterns.com more than a year ago and graciously donated the domain to me last year.
JackBe has two case studies in Mashup Patterns, but the purpose of this webex wasn’t to showcase these or their Presto Platform – it was an educational event designed to explain the evolution of Enterprise Mashups, present some of the patterns, and demo a few implementations JackBe had created.
What was more impressive than the record-breaking attendance we received was the quantity and quality of the questions raised during the talk. In fact, so many good issues were brought up that it’s taken us a few days to put all the answers together! The questions demonstrate that people are really starting to understand the promise of Mashups within the enterprise and are thinking about the right things.
As an additional bonus, JackBe secured permission to distribute the first chapter of Mashup Patterns for free. If you want to come up to speed on this new paradigm quickly (or know someone who does) this is a unique collection of great resources.
The video, question/answer archive, and Chapter 1 are available at: