Search SeedLists in Lotus Web Content Management

SeedList

1. Seedlist used by portal using WCMSeachSeedList application
2. Seedlist REST API (available from WCM 6.1.5 version)

Seedlist used by portal using WCMSeachSeedList application

A wcmsearchseed list page is used by the WebSphere Portal search engine to crawl a Web Content Management (WCM) searchable site.

http://cnptcam3015416.ptc.ca.kp.org:10038/wps/wcmsearchseed/searchseed?userid=wpsadmin&password=wpsadmin&site=RiverBend&lib=RiverBend&debug=true
siteid=
site=RiverBend
lib=RiverBend
debug=false(will get the decoded values) or debug=1
pageNum= (0 is for the first page)

The number of items per page in the wcmsearchseed list is configured by /wcm/shared/app/config/wcmservices/SearchService.properties SearchService.DefaultSeedPageSize. The default is 200. To see more than the first page, append the following URL parameter also:

http://www.ibm.com:10038/wps/wcmsearchseed/searchseed?siteid=98eae7804755055eb71db746880f549b&userid=wpsadmin&password=wpsadmin_password&debug=1&pageNum=1


If the site name or library name contains spaces, you must replace the space with a plus sign (+) character. For example, replace Web Content with Web+Content.


SeedList REST API

The IBM® Lotus Web Content Management API for retrieving application content through a seedlist is based on the REST architecture style. To obtain seedlist content, third party crawlers or administrator applications only need to construct and send HTTP requests to the application servlet. With seedlist format 1.0, the syntax of the seedlist URL has changed

All REST API requests are synchronous calls. The order of the parameters in the requests does not matter. The parameter names are case-sensitive and must be entered in the format described here. An HTTP error response (status code 404) is generated in the following situations:

  • An unknown or unsupported parameter is submitted as part of the request.
  • Web Content Management cannot resolve the site path or ID.
  • Web Content Management cannot find any items.
The request is a standard HTTP GET command. The URL is formed by combining the seedlist servlet host name, port number, and path; followed by a collection of name-value pairs (input parameters) separated by ampersand (&) characters.
https://hostname:port_number/seedlist/myserver?SeedlistId=seedlistID
OR path_to_site
&Source=com.ibm.workplace.wcm.plugins.seedlist.retriever.WCMRetrieverFactory&Action=GetDocuments
  • seedlist/server is used for anonymous access.
  • seedlist/myserver is used for authenticated access.

Parameter

Default Value

Description

SeedlistID

No default; must be specified.

Identifies the seedlist. This parameter can be specified in either of the following ways:

  • The UUID of the site.
  • The path to the site. For example, library_name/site

Start

0

Defines the start number for currently returned section.

Range

100

Defines the number of returned entries for current section.

Date

No default. If not specified, all applicable results are returned.

Indicates that entries (documents) that were updated after this date will be retrieved. The date format (compliant to standard ISO 8601) is the following : dateTtimezone, where date is yyyy-MM-dd, time is HH:mm:ss, and zone is ±hhmm. This format includes time zone information, which is critical if the client and server are in different time zones.

Important: Proper HTML URL encoding must be performed (for example, the plus symbol + should be represented as %2B).

Action

GetDocuments

Defines requested action to execute.

  • GetDocuments retrieve all underlying documents.
  • GetNumberOfDocuments returns the number of all underlying documents, typically for debug purposes. This value must be the same as the number of all documents returned from an appropriate GetDocuments request.

Format

ATOM

Defines the output format : ATOM / HTML/ XML.

Timestamp

No default.

Indicates the content provider timestamp from a previous crawling session. The timestamp represents for the content provider some snapshot of the content and allows the crawler to get only the content changes on the next crawling. This is used for incremental crawghtling.

No comments:

Post a Comment