Posts Tagged ‘leadingindicator’

Stock Picker Shows the Potential of Leading Indicator Pattern

A recent blog over at MIT’s Technology Review site caught my attention. Who wouldn’t read a post when the title promises “AI That Picks Stocks Better than the Pros”? I was expecting to learn about cutting-edge research into neural networks or some such, but instead I found a description of an approach I have been pitching for years now.

The “Arizonal Financial Text System” (AZFinText) works by “ingesting large quantities of financial news stories (in initial tests, from Yahoo Finance) along with minute-by-minute stock price data, and then using the former to figure out how to predict the latter”. The stories get a unique type of sentiment-analysis treatment, which was written by AZFinText’s authors, Robert P. Schumaker of Iona College in New Rochelle and and Hsinchun Chen of the University of Arizona. (more information can be found in their IEEE paper)

As you might expect, “Bad” stories can be expected to make a stock go down, while “Good” stories can make it go up. But what differs with Schumaker and Chen’s approach is that they did not use traditional human gauges of sentiment. Rather than look for emotionally weighted words like hate, love, good, or bad used in a typical sentiment analysis technique (yes, I am oversimplifying the process greatly), they back-tested the moves in a stock’s price against historical stories and used that data to derive the words that seemed to influence movement. This uncovered terms like hereto, comparable, charge, summit and green which caused the stock to move down, while words like planted, announcing, front, smaller and crude triggered an increase.

Before I explain my similar experiments in this area using mashups, I should point out that the basic idea here is nothing new. The article sites research in this area going back at least to 1990.

In my book, I describe applying mashups to this problem in my explanation of the Leading Indicator pattern:

Changes in a leading indicator may foretell downstream impact in other areas. Some, like the measurement of new home sales or bankruptcy filings have an obvious affect on many businesses. Well- known leading indicators do not offer any particular competitive advantage. Finding previously unknown relationships that predict business cycles and trends can be extremely useful.

Schumaker and Chen used Yahoo News stories in their study. But of course, the more data you use, the more onerous the collection problem becomes. As I noted in Mashup Patterns:

The key to uncovering these hidden links begins with collecting Time Series data from multiple locations. But with each additional source that is surveyed, the complexity of the data gathering process increases.

I also describe how collecting this data is applicable for discovering both Leading and Coincident Indicators. For example:

…knowledge of the week’s upcoming television schedule might not seem useful for a chain of pet stores. But when matched against a set of key words, a time series of “pet-themed” broadcasts can be assembled. Mashed up against a database of customer purchases, a Leading Indicator might emerge between Dog Shows and increased sales. The retailer now has a mechanism for advance inventory and advertising planning.

But before this post “goes to the dogs”, let’s get back to the Technology Review article. The reason it really stuck with me is because I had already demonstrated working implementations. About a year ago, I was invited to speak with several hedge funds, and for two of them I demonstrated a mashup that used the New York Times news API along with Reuter’s Open Calais to forecast stock fluctuations.

Although it is not noted how Schumaker and Chen got their data from Yahoo, using the data collection capabilities of many mashup products (for example, Kapow, Convertigo, or Connotate) are an excellent source for your own Leading Indicator implementation.

The key challenge is to stop thinking about external business sites from a consumer’s perspective and instead view them as databases to be mined via periodic data extraction. Here’s a final example from the Leading Indicator chapter that I know at least two firms have actually implemented:

Consider the value of building a mashup against popular online travel sites. Each day, an automated agent could book multiple flights between New York and London and from Boston to San Francisco. The mashup emulates the customer booking experience all the way up through seat selection, at which point it records the number of seats available and the ticket price. Naturally, the mashup wouldn’t complete the process of paying for the trip.

This information can be used to extrapolate the performance of the sector in advance of quarterly reports. Lots of seats available even as ticket prices decline? Probably not a sign of good earnings. Regularly filled seats at soaring prices might seem like good news until a time series of fuel prices hitting historical highs is added.

AZFinText operates on an extremely tight window of time, according to the article. It attempts to find stories that will move a stock within the next 20 minutes or so. This means that to be of practical use, it probably needs to be part of an automated trading system. But I hope the other examples I’ve shown demonstrate that larger timeframes can be involved and there can be plenty of time for actual people to evaluate the results and decide on a course of action. The Leading Indicator pattern might help you get information that your competitors don’t have. Or if they will have it eventually, you can at least get this knowledge before they do.

  • HP IDOL OnDemand Predict
    The Predict API offered by HP IDOL OnDemand lets developers classify data by using a classification service created by the Train Prediction API. HP IDOL OnDemand offers an array of data processing APIs for audio-video analytics, connectors, format conversion, image analysis, indexing, search, prediction, text analysis, user management, and more. Date Updated: 2015-03-27 Tags: [field_primary_category], […]
  • HP IDOL OnDemand Document Categorization
    The Document Categorization API offered by HP IDOL OnDemand lets developers categorize documents according to a set of categories that they create. To use this API, developers must create a text index with the Categorization flavor, by using the Create Text Index. HP IDOL OnDemand offers an array of data processing APIs for audio-video analytics, […]
  • HP IDOL OnDemand Delete User
    The Delete User API offered by HP IDOL OnDemand lets developers remove a user from a user store. HP IDOL OnDemand offers an array of data processing APIs for audio-video analytics, connectors, format conversion, image analysis, indexing, search, prediction, text analysis, user management, and more. Date Updated: 2015-03-26 Tags: [field_primary_category], [field_secondary_categories]
  • HP IDOL OnDemand Assign Role
    The Assign Role API offered by HP IDOL OnDemand lets developers assign a role to a user. HP IDOL OnDemand offers an array of data processing APIs for audio-video analytics, connectors, format conversion, image analysis, indexing, search, prediction, text analysis, user management, and more. Date Updated: 2015-03-26 Tags: [field_primary_category], [field_secondary_categories]
  • HP IDOL OnDemand Authenticate
    The Authenticate API offered by HP IDOL OnDemand lets developers authenticate a user against credentials in the user store. HP IDOL OnDemand offers an array of data processing APIs for audio-video analytics, connectors, format conversion, image analysis, indexing, search, prediction, text analysis, user management, and more. Date Updated: 2015-03-26 Tags: [field_primary_category], [field_secondary_categories]
  • HP IDOL OnDemand List Resources
    The List Resources API offered by HP IDOL OnDemand returns a list of the developer's dynamic resources. The response includes the name, flavor, and type of the indexes that you have created using the Create Text Index API. HP IDOL OnDemand offers an array of data processing APIs for audio-video analytics, connectors, format conversion, image […]
  • HP IDOL OnDemand Restore Text Index
    The Restore Text Index API offered by HP IDOL OnDemand lets developers restore an index at a previous state. HP IDOL OnDemand offers an array of data processing APIs for audio-video analytics, connectors, format conversion, image analysis, indexing, search, prediction, text analysis, user management, and more. Date Updated: 2015-03-26 Tags: [field_primary_category], [field_secondary_categories]
  • HP IDOL OnDemand Delete Connector
    The Delete Connector API offered by HP IDOL OnDemand lets developers delete an existing connector configuration. HP IDOL OnDemand offers an array of data processing APIs for audio-video analytics, connectors, format conversion, image analysis, indexing, search, prediction, text analysis, user management, and more. Date Updated: 2015-03-26 Tags: [field_primary_category], [field_secondary_categories]
  • HP IDOL OnDemand Train Prediction
    The Train Prediction API offered by HP IDOL OnDemand lets developers create a prediction model according to a training data set provided to the API. Data sets are run against several algorithms: Decision Tree, Logistic Regression, Naive Bayes and SVM. HP IDOL OnDemand offers an array of data processing APIs for audio-video analytics, connectors, format […]
  • AlchemyAPI Face Detection and Recognition Web
    The AlchemyVision Face Detection and Recognition Image API accepts an image file as an URL link. The API will scan a photo to detect facial locations and can recognize individuals present within a photograph, such as celebrities. The API will provide data on bounding box, gender, approximate age and name, if the image is of […]