Siva R Vaka: March 2013

Websphere MQ Basics

Here is a list of key terms about message queuing .

Term	Description
Queue managers	The queue manager is responsible for maintaining the queues it owns, and for storing all the messages it receives onto the appropriate queues.
Messages	A message is a string of bytes that is meaningful to the applications that use it. Messages are used to transfer information from one application program to another. The applications can be running on the same or on different computers.
Local queues	A local queue is a data structure used to store messages. The queue can be a normal queue or a transmission queue. A normal queue holds messages that are to be read by an application that is reading the message directly from the queue manager. A transmission queue holds messages that are in transit to another queue manager.
Remote queues	A remote queue is used to address a message to another queue manager.
Channels	Channels are use to send and receive message between queue managers.
Listeners	Listeners are processes that accept network requests from other queue managers, or client applications, and start associated channels.

Creating a queue manager called QM1

1. Create a queue manager with the name QM1 by typing the following command:

crtmqm QM1

When the system creates the queue manager, you see the following output:

C:\IBM\WebSphereMQ\bin>crtmqm QM1

WebSphere MQ queue manager created.

Directory 'C:\IBM\WebSphereMQ\qmgrs\QM1' created.

The queue manager is associated with installation 'Installation1'.

Creating or replacing default objects for queue manager 'QM1'.

Default objects statistics : 77 created. 0 replaced. 0 failed.

Completing setup.

Setup completed.

The queue manager is stopped. You must start the queue manager to administer it, and read and write messages from its queues.

2. Start the queue manager by entering the following command:

strmqm QM1

When the queue manager successfully starts, you see the following output:

C:\IBM\WebSphereMQ\bin>strmqm QM1

WebSphere MQ queue manager 'QM1' starting.

The queue manager is associated with installation 'Installation1'.

5 log records accessed on queue manager 'QM1' during the log replay phase.

Log replay for queue manager 'QM1' complete.

Transaction manager state recovered for queue manager 'QM1'.

WebSphere MQ queue manager 'QM1' started using V7.5.0.0.

With the queue manager started, you can now create the queues.

create a queue called LQ1

A queue is a WebSphere MQ queue manager object. There are three ways to create WebSphere MQ objects:

· Command-line.

· WebSphere MQ Explorer.

· Using a programmable interface.

1. Start the scripting tool by typing the following command.

runmqsc QM1

When the scripting tool starts, you see:

C:\IBM\WebSphereMQ\bin>runmqsc QM1

Starting MQSC for queue manager QM1.

The tool is ready to accept MQSC commands.

2. Create a local queue called LQ1 by typing the following MQSC command:

define qlocal(LQ1)

When the queue is created, you see the following output:

define qlocal(LQ1)

2 : define qlocal(LQ1)

AMQ8006: WebSphere MQ queue created.

3. Stop the scripting tool by typing the following MQSC command:

end

When the scripting tool ends, you see the following output:

One MQSC command read.

No commands have a syntax error.

All valid MQSC commands were processed.

C:\>

Displaying Queue Manager Status

To check the queue manager is running or not , type the following command: dspmq

C:\IBM\WebSphereMQ\bin>dspmq

QMNAME(QM1) STATUS(Running)

Putting a message to the Queue LQ1

1. Use the amqsput sample application to put a message to queue LQ1 by typing: amqsput LQ1 QM1

When the sample application starts, you see:

C:\>amqsput LQ1 QM1

Sample AMQSPUT0 start

target queue is LQ1

Type Hello World and press Enter. You placed a message that contains the text “Hello World” on the queue LQ1 managed by the queue manager called QM1. To end amqsput, press Enter.

You see the following output:

C:\>amqsput LQ1 QM1

Sample AMQSPUT0 start

target queue is LQ1

Hello World

Sample AMQSPUT0 end

Getting messages from the Queue LQ1

Use the amqsget sample application to read a message on the queue LQ1 by typing: amqsget LQ1 QM1

When the sample application starts, you see:

C:\>amqsget LQ1 QM1

Sample AMQSGET0 start

message <Hello World>

no more messages

Sample AMQSGET0 end

The amqsget application ends 30 seconds after reading the message.

Resources

http://pic.dhe.ibm.com/infocenter/wmqv7/v7r5/topic/com.ibm.mq.doc/wmq75.installconfig.pdf

MQ Server 7.5 Installation (On Windows 7 64-bit Professional)

Used Websphere MQ Launchpad and make sure software requirements for the MQ satisfies. All prerequisites must satisfy ( network configuration) .

NOTE

1. When installing Websphere MQ - I was asked:

"Do you need to configure a domain user ID for Websphere MQ to run under" and I selected NO. (Because the server is not in a domain)

2. These configurations works when I have the client and server on the same machine

Select language and accept the terms

I choose the custom installation option and gave the MQ path

After successful installation , you configure the webspheer MQ if any changes (even at the later point) using the prepare wizard.

Problems

1. When I tried to connect to Queue Manager(QM1) from the MQ explorer , got the following error

Could not establish a connection to the queue manager - reason 2538. (AMQ4059)

Severity: 10 (Warning)

Explanation: The attempt to connect to the queue manager failed. This could be because the queue manager is incorrectly configured to allow a connection from this system, or the connection has been broken.

Response: Try the operation again. If the error persists, examine the problem determination information to see if any information has been recorded.

To avoid this problem in Window 7 , make sure to open MQ explorer as "administrator" or login (then by default you will the QM1 connected).

Installing Licenses

Run as adminstrator

C:\IBM\WebSphereMQ\bin>setmqprd c:/temp/amqpcert.lic

NOTE:

As part of installation , "MUSER_MQADMIN" user got created as local windows 7 user (This is special user account and doesn’t appear on the logon screen like hidden user in windows).

When you install WebSphere MQ and run the Prepare WebSphere MQ Wizard for the first time, it creates a local user account for AMQMSRVN called MUSR_MQADMIN with the required settings and permissions. The password for MUSR_MQADMIN is randomly generated when the account is created, and used to configure the logon environment for AMQMSRVN. The generated password does not expire.

Resources

1. http://pic.dhe.ibm.com/infocenter/wmqv7/v7r5/index.jsp?topic=%2Fcom.ibm.mq.doc%2Fso10110_.htm

2. http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/topic/com.ibm.mq.amqzag.doc/fa12200_.htm

GSA (Google Search Appliance) integration with IBM Websphere Portal/ WCM

GSA (Google Search Appliance) is Google's enterprise search product. Recently I had to work with GSA as search solution in the IBM Websphere portal platform . Main use cases are crawling the WCM seeds (including the binary documents and WCM content) and portal content.

Checkout below for more details regarding the GSA Basics

There are two simple ways we can feed the portal/WCM content to GSA

Writing proxy component

Get the portal /WCM seedlists from the IBM system (using the IBM out of the box seedlist framework)
Parse IBM out of the box seedlist content and
Generate the GSA compatible seedlist
Post the GSA compatible seedlist to GSA server

Generating the GSA compatible Feed directly

Write custom component using the IBM portal/WCM API
Generate the feed in GSA supported format
Post feed to GSA

GSA (Google Search Appliance) -Collections

There is only one Index in GSA (default collection).
Collection in GSA is nothing but "view" on the default collection(whole index) based on the URL patterns. To limit search based on URL patterns , you can setup the collection.
When the user runs a search , "Site" query parameter decides or identifies which collection it should search against. (its like post filtering on indexing).
If user want to search against specific portion of this index , setup the collection and use site parameter to search against specific to collection.
"site" parameter can set to single collection or against the multiple collections. Following logical operations AND, OR, NOT and grouping can be used like below. eg :

To search against single collection products ::: &site=products
To search against products AND (intersection operation) services collections ::: &site=products.services
To search against products OR (union operation) services ::: &site=products|services
To not search in the products ::: &site=-products
To search in prices AND (products OR services) ::: &site=prices.(products|services)

Creating Collection

Goto indexes --> create collection
Click on edit on collection, to include or exclude the URL's (URL's you add here must also be in the main index)

Few things about the collections

You can define as many collections as you want.
If you define too many complex queries with collections then there might be performance impact that need to test.
You can't define the collections based on the metadata . But if the documents are organized in separate folder you may try to get the results based on the URL's pattern (collection) and metdata using the search logical operations.

Following is possible, Include URL pattern is like http://sivavaka.com/parent , http://sivavaka.com/parent/child/subchild and exclude pattern is like http://sivavaka.com/parent/child/ .

Default Collection: By default when you working with collections in admin console, it includes all collections (/ operator in default collection includes all content in the index)
Simple search scenario

When user selects specific category and search for the keyword, there is hidden site parameter will specify which collection to use.

Resources

https://developers.google.com/search-appliance/documentation/614

GSA (Google Search Appliance) Metadata Search

HTML <meta> tags are included as metadata

<html>

<head>

</head>

<body> ….

Date :: : 10 Mar 2013

</body>

</html>

You can search just inside the metadata as below

Meta tag existence ::: inmeta:generator ( This returns all documents with generator meta tag)
Partial value search::: inmeta:generator~my (This returns all documents with generator meta tag with value "my")
Exact value search ::: inmeta:category=private ( Now this returns documents with exact match like category=private )
Range based search ::: inmeta:zipcode:10000..10005 ( This return documents with zipcodes between 10000 and 10005)
Date Range Search::: daterange: ( This returns the documents with date falls under the range).
Logical AND operator ::: No query term like "AND" but by default documents that contain ALL query terms are returned
Logical OR operator ::: inmeta:category=private OR inmeta:zipcode=10001..10005

inmeta Syntax	Search Parameter Syntax	Description
inmeta: [meta tag]	&requiredfields=[meta tag name]	Returns results that contain the specified meta tag.
inmeta: [meta tag name]~[meta tag content]	&partialfields=[meta tag name]:[meta tag content]	Returns results that have the specified meta tag with a value that matches some or all of the specified meta tag content.
inmeta: [meta tag name]=[meta tag content]	&requiredfields=[meta tag name]:[meta tag content]	Returns only results that match the exact meta tag content value specified.

By default GSA does the partial lookup in metadata associated with documents also.
Limits on metadata

No of meta tags :: no limit
Total bytes for all tags :: 300k (There is no direct limit on the maximum number of bytes of meta data returned with each search result. However, meta tags and snippets beyond the first 300 KB of the document are not displayed or returned.)
Total bytes per tag :: 1500

NOTE: GSA has metadat search capability for a long time using the requiredfields and partialfields parameters as part of the search API protocol. The inmeta operator allows search users to issue partialfields and requiredfields type searches directly from the search box (in the q= parameter).

Resources

https://developers.google.com/search-appliance/documentation/614/xml_reference?hl=en

http://googleenterprise.blogspot.com/2006/11/tech-tip-theres-new-operator-in-town.html

GSA (Google Search Appliance) Crawling Content

If there are no crawlers defined, then there won't any data in index.
Term "Crawling" is used when getting the web content , "Traversal" is the term used to while reading info from the filesystem.
Name of crawler is under "console-->crawl and index--> http header"
Configuring crawlers

From admin console :

Login to GSA (http://gsa:8000/EnterpriseController)
Crawl and index --> crawl URL's
Need to fill Start pattern, follow patterns and do not follow pattern text areas

Can point to root or index page

Sends get request to start page (or root page), contains all links to pages on webserver, then it crawl recursively.

Can specify follow patterns or white list (can specify file extensions)

Like http://sivavaka.com/, /sales/, *.pdf, *.html

Can specify don't follow patterns or block list

Like http://sivavaka.com/sensitive, *.exe, *.bin
Sample do not crawl patterns are like links on calendar (previous and next years …create inifinity loop)

Rules for the configuring the patterns

Add trailing "/" for directories : causes to follows the all links and also looks for the index.html file by default.
"#" for the comments in patterns
Regular expressions::: regexp:\\.mdb$ (here \\ is escaping char)

NOTE: GSA doesn't store the binary version of binary documents like word documents or xls …etc but it converts the doc to html and stores the text. (text version link will give html version)

Document Dates:

several locations document dates can be located .

By default last-modified-date return by webserver
In url
Metatag
Body
title
From admin console --> Document dates

*/ --> last modified (*/ mean all documents)

Note: If date not found, document indexed without date and appeared at last.

Recrawling and load

Recrawling duration:

GSA by default set the recrawl to 3 days , if document has changes then it reduces the recrawl period to 50 % shorter that means day and half. If GSA see the change again then it shorter the 50% again until it doesn't see the change.
If document doesn't change then recrawl time will go up 50%.

Removing the documents

If document returns 404, you can set to remove document from index.
Also from "Crawl and index --> Freshness tuning" options, you can specify the URL pattern

Host load schedule

Parallel connections, max doc size , no of documents crawl…etc settings

You can each document crawl history from GSA console--> crawl diagnosis.

Reasons for URL's not in Index

a. Files blocked by Robots.txt

Sample ::

User-agent: *

Disallow: /Something/sensitive

User-agent: googlebot

Disallow: /

Unsupported file types
Files beyond license/coverage limit
Do not crawl pattern
Orphaned files
Password protected files
AJAX content
<frame> and <a> outside of <frameset>

Troubleshooting the URL not in the results

Browser access : If you don't setup the crawler access then you may not be able to crawl the secure documents
Crawl Diagnostics
Real-time Diagnostics
Cached version
Search results

You can directly search using following pattern to find

Info:<url> verfies the page is indexed

link:<url> pages that links this url

Resources

1. https://developers.google.com/search-appliance/documentation/614

Google Search Appliance (GSA) Feeds

There are two types of feeds we can input to GSA

Metadata-and-URL

Client sends the metadata and URL to actual document (binary documents or web like HTML) to GSA
GSA indexs the metadata and makes the HTTP GET call to crawl content of the URL. (It will crawls links on the HTML also).
Adds to the existing feed.

Content Feeds

It includes the both metadata and content (whole HTML content or text…)
If it is binary do the based 64 encoded and include in the XML
And this feed may be big and can be till 1 GB for xml feed
You need to specify feed type as Full/Incremental (indicates whether to replace / add to the existing feed content).

XML Feed Elements

Datasource : indicates the name of the feed
Feedtype : incremental, full or metadata-and-url

For the content feed it is going to be incremental/full
For the metadta-and-URL , it is just metadata and URL's

Reocrd : Each record is specific document, Can have different properties like URL, action , MIMEType

Action: is add/delete
URL: if it is metadata-and-url feed , GSA makes the get call with this URL but if it is content feed then GSA uses this as unique identifier to this record. (you may need to configure the userid/password to fetch the content using the URL)
MimeType : will denote how the GSA treats that document eg: application/pdf or text/html
Last-modified-date :
Authmethod: none, httpbasic, ntlm, httpsso
displayURL : for alternate URL ( Used for search results)

Metadata: You can send the ACL's using the metadata as blow

Content : actual content (like html content, binary content …etc)

encoding : specifies the encoding eg: base64binary or base64compressed.

<datasource>myfeed</datasource>

<feedtype>incremental</feedtype>

</header>

<group>

</metadata>

<![CDATA[

<html>

<body>this is html document body</body>

</html>

]]>

</content>

</record>

SDfadfxdafeadfdafJLKadfad

Adfadsfadf KDLDJSPofaifadf…

</content>

</record>

</content>

</record>

</group>

</gsafeed>

NOTE: Metadata-and-URL gives better relevance because it maintains the links structure(it follows the links) , but content feeds may not because it doesn't followed the links.

Sample HTML page to post the feed to GSA is as follows

<form enctype="multipart/form-data" action="http://<gsa>:19900/xmlfeed" method="post">

<input type="radio" name="feedtype" value="full">full

<input type="radio" name="feedtype" value="incremental">incremental

<input type="radio" name="feedtype" value="metadata-and-url">metadata-and-url

</form>

<body>

</html>

NOTE:

Default port for the HTTPS feeds is 19902
Max feed file size is : 1GB, if it is more than 1 GB you can break them into several feeds with same feed name.
Timeouts may happen if too many feeds are pushed
To delete the Feed ,

Add do not crawl pattern
For content feeds, you can delete the feed from feeds page
Delete the individual records using action="delete"

GSA Feed DTD

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT gsafeed (header, group+)>
<!ELEMENT header (datasource, feedtype)>

<!ELEMENT datasource (#PCDATA)>

<!ELEMENT feedtype (#PCDATA)>

<!ELEMENT group (record*)>

<!ELEMENT record (metadata*,content*)>
<!ELEMENT metadata (meta*)>
<!ELEMENT meta EMPTY>
<!ELEMENT content (#PCDATA)>

<!ATTLIST group
   action (add|delete) "add"
   pagerank CDATA #IMPLIED>
<!ATTLIST record
   url CDATA #REQUIRED
   displayurl CDATA #IMPLIED
   action (add|delete) #IMPLIED
   mimetype CDATA #REQUIRED
   last-modified CDATA #IMPLIED
   lock (true|false) "false"
   authmethod (none|httpbasic|ntlm|httpsso) "none"
   pagerank CDATA #IMPLIED>

<!ATTLIST meta
   encoding (base64binary) #IMPLIED
   name CDATA #REQUIRED
   content CDATA #REQUIRED>

<!ATTLIST content encoding (base64binary|base64compressed) #IMPLIED>

Resources