Websphere MQ Basics


  1. Here is a list of key terms about message queuing .

    Term
    Description
    Queue managers
    The queue manager is responsible for maintaining the queues it owns, and for storing all the messages it receives onto the appropriate queues.
    Messages
    A message is a string of bytes that is meaningful to the applications that use it. Messages are used to transfer information from one application program to another. The applications can be running on the same or on different computers.
    Local queues
    A local queue is a data structure used to store messages. The queue can be a normal queue or a transmission queue. A normal queue holds messages that are to be read by an application that is reading the message directly from the queue manager. A transmission queue holds messages that are in transit to another queue manager.
    Remote queues
    A remote queue is used to address a message to another queue manager.
    Channels
    Channels are use to send and receive message between queue managers.
    Listeners
    Listeners are processes that accept network requests from other queue managers, or client applications, and start associated channels.

    Creating a queue manager called QM1

    1.      Create a queue manager with the name QM1 by typing the following command:
    crtmqm QM1
    When the system creates the queue manager, you see the following output:
    C:\IBM\WebSphereMQ\bin>crtmqm QM1
    WebSphere MQ queue manager created.
    Directory 'C:\IBM\WebSphereMQ\qmgrs\QM1' created.
    The queue manager is associated with installation 'Installation1'.
    Creating or replacing default objects for queue manager 'QM1'.
    Default objects statistics : 77 created. 0 replaced. 0 failed.
    Completing setup.
    Setup completed.

    The queue manager is stopped. You must start the queue manager to administer it, and read and write messages from its queues.
    2.      Start the queue manager by entering the following command:
    strmqm QM1
    When the queue manager successfully starts, you see the following output:

    C:\IBM\WebSphereMQ\bin>strmqm QM1
    WebSphere MQ queue manager 'QM1' starting.
    The queue manager is associated with installation 'Installation1'.
    5 log records accessed on queue manager 'QM1' during the log replay phase.
    Log replay for queue manager 'QM1' complete.
    Transaction manager state recovered for queue manager 'QM1'.
    WebSphere MQ queue manager 'QM1' started using V7.5.0.0.

    With the queue manager started, you can now create the queues.

    create a queue called LQ1

    A queue is a WebSphere MQ queue manager object. There are three ways to create WebSphere MQ objects:

    ·        Command-line.
    ·        WebSphere MQ Explorer.
    ·        Using a programmable interface.

    1.      Start the scripting tool by typing the following command.
    runmqsc QM1

    When the scripting tool starts, you see:
    C:\IBM\WebSphereMQ\bin>runmqsc QM1
    5724-H72 (C) Copyright IBM Corp. 1994, 2011.  ALL RIGHTS RESERVED.
    Starting MQSC for queue manager QM1.
    The tool is ready to accept MQSC commands.
    2.      Create a local queue called LQ1 by typing the following MQSC command:
    define qlocal(LQ1)
    When the queue is created, you see the following output:
    define qlocal(LQ1)
         2 : define qlocal(LQ1)
    AMQ8006: WebSphere MQ queue created.

    3.      Stop the scripting tool by typing the following MQSC command:
    end
    When the scripting tool ends, you see the following output:
    One MQSC command read.
    No commands have a syntax error.
    All valid MQSC commands were processed.

    C:\>

    Displaying Queue Manager Status

    To check the queue manager is running or not , type the following command: dspmq

     C:\IBM\WebSphereMQ\bin>dspmq
    QMNAME(QM1)   STATUS(Running)

    Putting a message to the Queue LQ1

    1.      Use the amqsput sample application to put a message to queue LQ1 by typing: amqsput LQ1 QM1

    When the sample application starts, you see:
    C:\>amqsput LQ1 QM1
    Sample AMQSPUT0 start
    target queue is LQ1
    Type Hello World and press Enter. You placed a message that contains the text “Hello World” on the queue LQ1 managed by the queue manager called QM1. To end amqsput, press Enter.

    You see the following output:
    C:\>amqsput LQ1 QM1
    Sample AMQSPUT0 start
    target queue is LQ1
    Hello World

    Sample AMQSPUT0 end

    Getting messages from the Queue LQ1
    Use the amqsget sample application to read a message on the queue LQ1 by typing: amqsget LQ1 QM1

    When the sample application starts, you see:

    C:\>amqsget LQ1 QM1
    Sample AMQSGET0 start
    message <Hello World>
    no more messages
    Sample AMQSGET0  end

    The amqsget application ends 30 seconds after reading the message.


    Resources


MQ Server 7.5 Installation (On Windows 7 64-bit Professional)


Used Websphere MQ Launchpad and make sure software requirements for the MQ satisfies. All prerequisites must satisfy ( network configuration) .





NOTE
1.     When installing Websphere MQ - I was asked:
"Do you need to configure a domain user ID for Websphere MQ to run under" and I selected NO. (Because the server is not in a domain)

2. These configurations works when I have the client and server on the same machine




Select language and accept the terms





 I choose the custom installation option and gave the MQ path








After successful installation , you configure the webspheer MQ if any changes (even at the later point) using the prepare wizard.









Problems
1.      When I tried to connect to Queue Manager(QM1) from the MQ explorer , got the following error
  Could not establish a connection to the queue manager - reason 2538. (AMQ4059)
  Could not establish a connection to the queue manager - reason 2538. (AMQ4059)
  Severity: 10 (Warning)
  
       Explanation: The attempt to connect to the queue manager failed. This could be because the queue manager is incorrectly configured to allow a connection from this system, or the connection has been broken.
  Response: Try the operation again. If the error persists, examine the problem determination information to see if any information has been recorded.

  To avoid this problem in Window 7 , make sure to open MQ explorer as "administrator" or login (then by default you will the QM1 connected).




Installing Licenses

Run as adminstrator
C:\IBM\WebSphereMQ\bin>setmqprd c:/temp/amqpcert.lic

NOTE:  
As part of installation , "MUSER_MQADMIN" user got created as local windows 7 user (This is special user account and doesn’t appear on the logon screen like hidden user in windows).
When you install WebSphere MQ and run the Prepare WebSphere MQ Wizard for the first time, it creates a local user account for AMQMSRVN called MUSR_MQADMIN with the required settings and permissions. The password for MUSR_MQADMIN is randomly generated when the account is created, and used to configure the logon environment for AMQMSRVN. The generated password does not expire.

Resources





GSA (Google Search Appliance) integration with IBM Websphere Portal/ WCM


GSA (Google Search Appliance) is Google's enterprise search product. Recently I had to work with GSA as search solution in the IBM Websphere portal platform .  Main use cases are crawling the WCM seeds (including the binary documents and WCM content)  and portal content.

Checkout below for more details regarding the GSA Basics

  1. GSA Feeds
  2. Crawling Content 
  3. Collections in GSA
  4. Metadata search


There are two simple ways we can feed the portal/WCM content to GSA

  1. Writing proxy component
    1. Get the portal /WCM seedlists from the IBM system (using the IBM out of the box seedlist framework)
    2. Parse IBM out of the box seedlist content and
    3. Generate the GSA compatible seedlist
    4. Post the GSA compatible seedlist to GSA server
  2. Generating the GSA compatible Feed directly
    1. Write custom component using the IBM portal/WCM API
    2. Generate the feed in GSA supported format
    3. Post feed to GSA 

GSA (Google Search Appliance) -Collections



  1. There is only one Index in GSA (default collection).
  2. Collection in GSA is nothing but "view" on the default collection(whole index) based on the URL patterns. To limit search based on URL patterns , you can setup the collection.
  3. When the user runs a search , "Site" query parameter decides or identifies which collection it should search against.  (its like post filtering on indexing).
  4. If user want to search against specific portion of this index , setup the collection and use site parameter to search  against specific to collection.
  5. "site" parameter can set to single collection or against the multiple collections. Following logical operations AND, OR, NOT and grouping can be used like below. eg :
    1. To search against single collection products ::: &site=products
    2. To search against products AND (intersection operation) services collections ::: &site=products.services
    3. To search against products OR (union operation) services ::: &site=products|services
    4. To not search in the products ::: &site=-products
    5. To search in prices AND (products OR services) ::: &site=prices.(products|services)

  1. Creating Collection
    1. Goto indexes --> create collection
    2. Click on edit on collection, to include or exclude the URL's (URL's you add here must also be in the main index)

  1. Few things about the collections
    1. You can define as many collections as you want.
    2. If you define too many complex queries with collections then there might be performance impact that need to test.
    3. You can't define the collections based on the metadata .  But if the documents are organized in separate folder you may try to get the results based on the URL's pattern (collection) and  metdata using the search logical operations.

  1. Following is possible, Include URL pattern is like http://sivavaka.com/parenthttp://sivavaka.com/parent/child/subchild  and exclude pattern is like http://sivavaka.com/parent/child/ .

  1. Default Collection: By default when you working with collections in admin console, it includes all collections (/ operator in default collection includes all content in the index)
  2. Simple search scenario
    1. When user selects specific category and search for the keyword, there is hidden site parameter will specify which collection to use.


Resources

GSA (Google Search Appliance) Metadata Search


  1. HTML <meta> tags are included as metadata
<html>
<head>
<meta name="generator" content="my testing">
<meta name="category" content="private">
<meta name="zipcode" content="10001">
<meta name="publishdate" content="03.10.2013">
<titile> News on 2013-03-10</title>
</head>
<body> …. 
         Date :: : 10 Mar 2013
</body>
</html>
  1. You can search just inside the metadata as below
    1. Meta tag existence  :::   inmeta:generator  ( This returns all documents with generator meta tag)
    2. Partial value search:::  inmeta:generator~my (This returns all documents with generator meta tag with value "my")
    3. Exact value search :::  inmeta:category=private ( Now this returns documents with exact match like  category=private )
    4. Range based search ::: inmeta:zipcode:10000..10005 ( This return documents with zipcodes between 10000 and 10005)
    5. Date Range Search::: daterange:    ( This returns the documents with date falls under the range).
    6. Logical AND operator ::: No query term like "AND" but by default documents that contain ALL query terms are returned
    7. Logical OR operator :::  inmeta:category=private OR inmeta:zipcode=10001..10005


inmeta Syntax
Search Parameter Syntax
Description
inmeta: [meta tag]
&requiredfields=[meta tag name]
Returns results that contain the specified meta tag.
inmeta: [meta tag name]~[meta tag content]
&partialfields=[meta tag name]:[meta tag content]
Returns results that have the specified meta tag with a value that matches some or all of the specified meta tag content.
inmeta: [meta tag name]=[meta tag content]
&requiredfields=[meta tag name]:[meta tag content]
Returns only results that match the exact meta tag content value specified.



  1. By default GSA does the partial lookup in metadata associated with documents also.
  2. Limits on metadata
    1. No of meta tags :: no limit
    2. Total bytes for all tags :: 300k (There is no direct limit on the maximum number of bytes of meta data returned with each search result. However, meta tags and snippets beyond the first 300 KB of the document are not displayed or returned.)
    3. Total bytes per tag :: 1500

NOTE: GSA has metadat search capability for a long time using the requiredfields and partialfields parameters as part of the search API protocol. The inmeta operator allows search users to issue partialfields and requiredfields type searches directly from the search box (in the q= parameter).

Resources

GSA (Google Search Appliance) Crawling Content


  1. If there are no crawlers defined, then there won't any data in index.
  2. Term "Crawling" is used when getting the web content , "Traversal" is the term used to while reading info from the filesystem.
  3. Name of crawler is under "console-->crawl and index--> http header"
  4. Configuring crawlers
    1. From admin console :
      1. Login to GSA (http://gsa:8000/EnterpriseController)
      2. Crawl and index --> crawl URL's
      3. Need to fill Start pattern, follow patterns  and do not follow pattern text areas
    2. Can point to root or index page
      1. Sends get request to start page (or root page), contains all links to pages on webserver, then it crawl recursively.
    3. Can specify follow patterns or white list (can specify file extensions)
      1. Like http://sivavaka.com/, /sales/, *.pdf, *.html
    4. Can specify don't follow patterns or block list
      1. Like http://sivavaka.com/sensitive, *.exe, *.bin
      2. Sample do not crawl patterns are like links on calendar (previous and next years …create inifinity loop)
    5. Rules for the configuring the patterns
      1. Add trailing "/" for directories : causes to follows the all links and also looks for the index.html file by default.
      2. "#" for the comments in patterns
      3. Regular expressions::: regexp:\\.mdb$  (here \\ is escaping char)
NOTE: GSA doesn't store the binary version of binary documents like word documents or xls …etc but it converts the doc to html and stores the text. (text version link will give html version)


  1. Document Dates:
    1. several locations document dates can be located .
      1. By default last-modified-date return by webserver
      2. In url
      3. Metatag
      4. Body
      5. title
      6. From admin console --> Document dates
        1. */ --> last modified  (*/ mean all documents)
Note: If date not found, document indexed without date and appeared at last.


  1. Recrawling and load
    1. Recrawling duration:
      1. GSA by default set the recrawl to 3 days , if document has changes then it reduces the recrawl period to 50 % shorter that means day and half. If GSA see the change again then it shorter the 50% again until it doesn't see the change.
      2. If document doesn't change then recrawl time will go up  50%.
    2. Removing the documents
      1. If document returns 404, you can set to remove document from index.
      2. Also from "Crawl and index --> Freshness tuning" options, you can specify the URL pattern
    3. Host load schedule
      1. Parallel connections, max doc size , no of documents crawl…etc settings
    4. You can each document crawl history from GSA console--> crawl diagnosis.

  1. Reasons for URL's not in Index
                              a.  Files blocked by Robots.txt
Sample ::
User-agent: *
Disallow: /Something/sensitive
User-agent: googlebot
Disallow: /
  1. Unsupported file types
  2. Files beyond license/coverage limit
  3. Do not crawl pattern
  4. Orphaned files
  5. Password protected files
  6. AJAX content
  7. <frame> and <a> outside of <frameset>
  1. Troubleshooting the URL not in the results
    1. Browser access : If you don't setup the crawler access then you may not be able to crawl the secure documents
    2. Crawl Diagnostics
    3. Real-time Diagnostics
    4. Cached version
    5. Search results
You can directly search using following pattern to find
Info:<url> verfies the page is indexed
link:<url> pages that links this url

Resources

Google Search Appliance (GSA) Feeds


There are two types of feeds we can input to GSA

  1. Metadata-and-URL
    1. Client sends the metadata and URL to actual document (binary documents  or web like HTML) to GSA
    2. GSA indexs the metadata and makes the HTTP GET call to crawl content of the URL. (It will crawls links on the HTML also).
    3. Adds to the existing feed.
  2. Content Feeds
    1. It includes the both metadata and content (whole HTML content or text…)
    2. If it is binary do the based 64 encoded and include in the XML
    3. And this feed may be big and can be till 1 GB for xml feed
    4. You need to specify feed type as Full/Incremental  (indicates whether to replace / add to the existing feed content).


XML Feed Elements
  1. Datasource : indicates the name of the feed
  2. Feedtype   :  incremental, full or metadata-and-url 
    1. For the content feed it is going to be incremental/full
    2. For the metadta-and-URL , it is just metadata and URL's
  3. Reocrd : Each record is specific document,  Can have different properties like URL, action , MIMEType
    1. Action:  is add/delete
    2. URL: if it is metadata-and-url feed , GSA makes the get call with this URL but if it is content feed then GSA uses this as unique identifier to this record. (you may need to configure the userid/password to fetch the content using the URL)
    3. MimeType : will denote how the GSA treats that document eg: application/pdf or text/html
    4. Last-modified-date :
    5. Authmethod: none, httpbasic, ntlm, httpsso
    6. displayURL : for alternate URL ( Used for search results)
  4. Metadata: You can send the ACL's using the metadata as blow
<meta name="google:aclusers" content="siva,vaka"/>
<meta name="google:aclgroups" content="admins,hr"/>

  1. Content : actual content (like html content, binary content …etc)
    1. encoding : specifies the encoding eg: base64binary or base64compressed.


<gsafeed>
<header>
<datasource>myfeed</datasource>
<feedtype>incremental</feedtype>
</header>
<group>
<record url="" action="add" mimetype="text/html">
<metadata>
<meta name="state" content="new york">
<meta name="city" content="buffalo">
<meta name="google:aclusers" content="siva,vaka"/>
<meta name="google:aclgroups" content="admins,hr"/
</metadata>
<content>
<![CDATA[
<html>
<head><title>my doc</title></head>
<body>this is html document body</body>
</html>
]]>
</content>
</record>

<record url="" action="add" mimetype="application/pdf">
<content encoding="base64binary">
SDfadfxdafeadfdafJLKadfad
Adfadsfadf KDLDJSPofaifadf…
</content>
</record>

<record url="http://test.com/page1.html" displayURL="http://test.com/pagesIndex.html" action="add" mimetype="text/html">
<content>

</content>

</record>

</group>
</gsafeed>


NOTE: Metadata-and-URL gives better relevance because it maintains the links structure(it follows the links) , but content feeds may not because it doesn't followed the links.

Sample HTML page to post the feed to GSA is as follows
<html>
         <body>
<form enctype="multipart/form-data" action="http://<gsa>:19900/xmlfeed" method="post">
<input type="text" name="datasource">
<input type="radio" name="feedtype" value="full">full
<input type="radio" name="feedtype" value="incremental">incremental
<input type="radio" name="feedtype" value="metadata-and-url">metadata-and-url
<input type="file" name="data">
<input type="submit" value="submit">
</form>
<body>
</html>

NOTE:
  1. Default port for the HTTPS feeds is 19902
  2. Max feed file size is : 1GB, if it is more than 1 GB you can break them into several feeds with same feed name.
  3. Timeouts may happen if too many feeds are pushed
  4. To delete the Feed ,
    1. Add do not crawl pattern
    2. For content feeds, you can delete the feed from feeds page
    3. Delete the individual records using action="delete"


GSA Feed DTD

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT gsafeed (header, group+)>
<!ELEMENT header (datasource, feedtype)>
<!-- datasource name should match the regex [a-zA-Z_][a-zA-Z0-9_-]*,
     the first character must be a letter or underscore,
     the rest of the characters can be alphanumeric, dash, or underscore. -->
<!ELEMENT datasource (#PCDATA)>
<!-- feedtype must be either 'full', 'incremental', or 'metadata-and-url' -->
<!ELEMENT feedtype (#PCDATA)>
<!-- group element lets you group records together and
     specify a common action for them -->
<!ELEMENT group (record*)>
<!-- record element can have attribute that overrides group's element-->
<!ELEMENT record (metadata*,content*)>
<!ELEMENT metadata (meta*)>
<!ELEMENT meta EMPTY>
<!ELEMENT content (#PCDATA)>
<!-- default is 'add' -->
<!-- last-modified date as per RFC822 -->
<!ATTLIST group
   action (add|delete) "add"
   pagerank CDATA #IMPLIED>
<!ATTLIST record
   url CDATA #REQUIRED
   displayurl CDATA #IMPLIED
   action (add|delete) #IMPLIED
   mimetype CDATA #REQUIRED
   last-modified CDATA #IMPLIED
   lock (true|false) "false"
   authmethod (none|httpbasic|ntlm|httpsso) "none"
   pagerank CDATA #IMPLIED>
<!ATTLIST meta
   encoding (base64binary) #IMPLIED
   name CDATA #REQUIRED
   content CDATA #REQUIRED>
<!-- for content, if encoding is specified, it should be either base64binary
     (base64 encoded) or base64compressed (zlib compressed and then base64
     encoded). -->
<!ATTLIST content encoding (base64binary|base64compressed) #IMPLIED>

Resources
  1. https://developers.google.com/search-appliance/documentation/612/feedsguide#system
  2. https://developers.google.com/search-appliance/documentation/612/feedsguide#system