博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Solr Tutorial
阅读量:7010 次
发布时间:2019-06-28

本文共 14127 字,大约阅读时间需要 47 分钟。

  hot3.png

Overview

This document covers the basics of running Solr using an example schema, and some sample data.

Requirements

To follow along with this tutorial, you will need...

  1. Java 1.6 or greater. Some places you can get it are from , , or .
    • Running java -version at the command line should indicate a version number starting with 1.6.
    • Gnu's GCJ is not supported and does not work with Solr.
  2. A .

Getting Started

Please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly point to your Solr server.

Begin by unzipping the Solr release and changing your working directory to be the "example" directory. (Note that the base directory name may vary with the version of Solr downloaded.) For example, with a shell in UNIX, Cygwin, or MacOS:

user:~solr$ ls solr-nightly.zipuser:~solr$ unzip -q solr-nightly.zip user:~solr$ cd solr-nightly/example/

Solr can run in any Java Servlet Container of your choice, but to simplify this tutorial, the example index includes a small installation of Jetty.

To launch Jetty with the Solr WAR, and the example configs, just run the start.jar ...

user:~/solr/example$ java -jar start.jar 2012-06-06 15:25:59.815:INFO:oejs.Server:jetty-8.1.2.v201203082012-06-06 15:25:59.834:INFO:oejdp.ScanningAppProvider:Deployment monitor .../solr/example/webapps at interval 02012-06-06 15:25:59.839:INFO:oejd.DeploymentManager:Deployable added: .../solr/example/webapps/solr.war...Jun 6, 2012 3:26:03 PM org.apache.solr.core.SolrCore registerSearcherINFO: [collection1] Registered new searcher Searcher@7527e2ee main{StandardDirectoryReader(segments_1:1)}

This will start up the Jetty application server on port 8983, and use your terminal to display the logging information from Solr.

You can see that the Solr is running by loading  in your web browser. This is the main starting point for Administering Solr.

Indexing Data

Your Solr server is up and running, but it doesn't contain any data. You can modify a Solr index by POSTing commands to Solr to add (or update) documents, delete documents, and commit pending adds and deletes. These commands can be in a .

The exampledocs directory contains sample files showing of the types of commands Solr accepts, as well as a java utility for posting them from the command line (a post.sh shell script is also available, but for this tutorial we'll use the cross-platform Java client. Run java -jar post.jar -h so see it's various options).

To try this, open a new terminal window, enter the exampledocs directory, and run "java -jar post.jar" on some of the XML files in that directory.

user:~/solr/example/exampledocs$ java -jar post.jar solr.xml monitor.xml SimplePostTool: version 1.4SimplePostTool: POSTing files to http://localhost:8983/solr/update..SimplePostTool: POSTing file solr.xmlSimplePostTool: POSTing file monitor.xmlSimplePostTool: COMMITting Solr index changes..

You have now indexed two documents in Solr, and committed these changes. You can now search for "solr" by loading the  in the Admin interface, and entering "solr" in the "q" text box. Clicking the "Execute Query" button should display the following URL containing one result...

You can index all of the sample data, using the following command (assuming your command line shell supports the *.xml notation):

user:~/solr/example/exampledocs$ java -jar post.jar *.xml SimplePostTool: version 1.4SimplePostTool: POSTing files to http://localhost:8983/solr/update..SimplePostTool: POSTing file gb18030-example.xmlSimplePostTool: POSTing file hd.xmlSimplePostTool: POSTing file ipod_other.xmlSimplePostTool: POSTing file ipod_video.xml...SimplePostTool: POSTing file solr.xmlSimplePostTool: POSTing file utf8-example.xmlSimplePostTool: POSTing file vidcard.xmlSimplePostTool: COMMITting Solr index changes..

...and now you can search for all sorts of things using the default  (a superset of the Lucene query syntax)...

There are many other different ways to import your data into Solr... one can

  • Import records from a database using the .
  •  (comma separated values), including those exported by Excel or MySQL.
  • Index binary documents such as Word and PDF with  (ExtractingRequestHandler).
  • Use  for Java or other Solr clients to programatically create documents to send to Solr.

Updating Data

You may have noticed that even though the file solr.xml has now been POSTed to the server twice, you still only get 1 result when searching for "solr". This is because the example schema.xml specifies a "uniqueKey" field called "id". Whenever you POST commands to Solr to add a document with the same value for theuniqueKey as an existing document, it automatically replaces it for you. You can see that that has happened by looking at the values for numDocs and maxDoc in the "CORE"/searcher section of the statistics page...

numDocs represents the number of searchable documents in the index (and will be larger than the number of XML files since some files contained more than one<doc>). maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index. You can re-post the sample XML files over and over again as much as you want and numDocs will never increase, because the new documents will constantly be replacing the old.

Go ahead and edit the existing XML files to change some of the data, and re-run the java -jar post.jar command, you'll see your changes reflected in subsequent searches.

Deleting Data

You can delete data by POSTing a delete command to the update URL and specifying the value of the document's unique key field, or a query that matches multiple documents (be careful with that one!). Since these commands are smaller, we will specify them right on the command line rather than reference an XML file.

Execute the following command to delete a specific document

java -Ddata=args -Dcommit=false -jar post.jar "
SP2514N
"

Because we have specified "commit=false", a search for  we still find the document we have deleted. Since the example configuration uses Solr's "autoCommit" feature Solr will still automatically persist this change to the index, but it will not affect search results until an "openSearcher" commit is explicitly executed.

Using the  for the updateHandler you can observe this delete propogate to disk by watching the deletesById value drop to 0 as thecumulative_deletesById and autocommit values increase.

Here is an example of using delete-by-query to delete anything with  in the name:

java -Dcommit=false -Ddata=args -jar post.jar "
name:DDR
"

You can force a new searcher to be opened to reflect these changes by sending an explicit commit command to Solr:

java -jar post.jar -

Now re-execute  and verify that no matching documents are found. You can also revisit the statistics page and observe the changes to both the number of commits in the  and the numDocs in the .

Commits that open a new searcher can be expensive operations so it's best to make many changes to an index in a batch and then send the commit command at the end. There is also an optimize command that does the same things as commit, but also forces all index segments to be merged into a single segment -- this can be very resource intensive, but may be worthwhile for improving search speed if your index changes very infrequently.

All of the update commands can be specified using either  or .

To continue with the tutorial, re-add any documents you may have deleted by going to the exampledocs directory and executing

java -jar post.jar *.xml

Querying Data

Searches are done via HTTP GET on the select URL with the query string in the q parameter. You can pass a number of optional  to the request handler to control what information is returned. For example, you can use the "fl" parameter to control what stored fields are returned, and if the relevancy score is returned:

  •  (return only name and id fields)
  •  (return relevancy score as well)
  •  (return all stored fields, as well as relevancy score)
  •  (add sort specification: sort by price descending)
  •  (return response in JSON format)

The  provided in the web admin interface allows setting various request parameters and is useful when testing or debugging queries.

Sorting

Solr provides a simple method to sort on one or more indexed fields. Use the "sort' parameter to specify "field direction" pairs, separated by commas if there's more than one sort field:

"score" can also be used as a field name when specifying a sort:

Complex functions may also be used to sort results:

If no sort is specified, the default is score desc to return the matches having the highest relevancy.

Highlighting

Hit highlighting returns relevant snippets of each returned document, and highlights terms from the query within those context snippets.

The following example searches for video card and requests highlighting on the fields name,features. This causes a highlighting section to be added to the response with the words to highlight surrounded with <em> (for emphasis) tags.

More request parameters related to controlling highlighting may be found .

Faceted Search

Faceted search takes the documents matched by a query and generates counts for various properties or categories. Links are usually provided that allows users to "drill down" or refine their search results based on the returned categories.

The following example searches for all documents (*:*) and requests counts by the category field cat.

Notice that although only the first 10 documents are returned in the results list, the facet counts generated are for the complete set of documents that match the query.

We can facet multiple ways at the same time. The following example adds a facet on the boolean inStock field:

Solr can also generate counts for arbitrary queries. The following example queries for ipod and shows prices below and above 100 by using range queries on the price field.

Solr can even facet by numeric ranges (including dates). This example requests counts for the manufacture date (manufacturedate_dt field) for each year between 2004 and 2010.

More information on faceted search may be found on the  and  pages.

Search UI

Solr includes an example search interface built with  that demonstrates many features, including searching, faceting, highlighting, autocomplete, and geospatial searching.

Try it out at 

Text Analysis

Text fields are typically indexed by breaking the text into words and applying various transformations such as lowercasing, removing plurals, or stemming to increase relevancy. The same text transformations are normally applied to any queries in order to match what is indexed.

The  defines the fields in the index and what type of analysis is applied to them. The current schema your collection is using may be viewed directly via the  in the Admin UI, or explored dynamically using the .

The best analysis components (tokenization and filtering) for your textual content depends heavily on language. As you can see in the , many of the fields in the example schema are using a fieldType named text_general, which has defaults appropriate for most languages.

If you know your textual content is English, as is the case for the example documents in this tutorial, and you'd like to apply English-specific stemming and stop word removal, as well as split compound words, you can use the  fieldType instead. Go ahead and edit the schema.xml in thesolr/example/solr/collection1/conf directory, to use the text_en_splitting fieldType for the text and features fields like so:

...

Stop and restart Solr after making these changes and then re-post all of the example documents using java -jar post.jar *.xml. Now queries like the ones listed below will demonstrate English-specific transformations:

  • A search for  can match PowerShot, and  can match A-DATA by using the WordDelimiterFilter and LowerCaseFilter.
  • A search for  can match Rechargeable using the stemming features of PorterStemFilter.
  • A search for  can match 1GB, and the commonly misspelled  can matches Pixma using the SynonymFilter.

A full description of the analysis components, Analyzers, Tokenizers, and TokenFilters available for use is .

Analysis Debugging

There is a handy  where you can see how a text value is broken down into words by both Index time nad Query time analysis chains for a field or field type. This page shows the resulting tokens after they pass through each filter in the chains.

 shows the tokens created from "Canon Power-Shot SD500" using the text_en_splitting type. Each section of the table shows the resulting tokens after having passed through the next TokenFilter in the (Index) analyzer. Notice how both powershot and powershot are indexed, using tokens that have the same "position". (Compare the previous output with .)

Mousing over the section label to the left of the section will display the full name of the analyzer component at that stage of the chain. Toggling the "Verbose Output" checkbox will .

When both  values are provided, two tables will be displayed side by side showing the results of each chain. Terms in the Index chain results that are equivalent to the final terms produced by the Query chain will be highlighted.

Other interesting examples:

  •  using the text_en field type
  •  using the text_cjk field type
  •  using the text_ja field type
  •  using the text_ar field type

Conclusion

Congratulations! You successfully ran a small Solr instance, added some documents, and made changes to the index and schema. You learned about queries, text analysis, and the Solr admin interface. You're ready to start using Solr on your own project! Continue on with the following steps:

  • Subscribe to the Solr !
  • Make a copy of the Solr example directory as a template for your project.
  • Customize the schema and other config in solr/collection1/conf/ to meet your needs.

Solr has a ton of other features that we haven't touched on here, including  to handle huge document collections, , , and . Explore the  to find more details about Solr's many .

Have Fun, and we'll see you on the Solr mailing lists!

 
Copyright © 2012 

转载于:https://my.oschina.net/penngo/blog/168640

你可能感兴趣的文章
梳理下MySQL崩溃恢复过程
查看>>
红包金额均分实现
查看>>
Google校园招聘题 -- 程序员买房
查看>>
线程的属性(优先级、守护线程、未捕获异常处理器)
查看>>
oracle批量插入测试数据
查看>>
goahead-3.6.2-src 移植到linux
查看>>
Mysql数据库调优和性能优化的21条最佳实践
查看>>
iOS视频播放-MPMoviePlayerController
查看>>
mysql导入导出数据中文乱码解决方法小结
查看>>
使用Mob短信sdk遇到的问题,解决
查看>>
android-------- 强引用、软引用、弱引用、虚引用使用
查看>>
HTML标签marquee实现滚动效果
查看>>
html字符操作
查看>>
oracle函数
查看>>
百度贴吧爬虫1.0
查看>>
ant+jmeter接口批量执行测试用例
查看>>
Mongodb
查看>>
小规模低性能低流量网站设计原则
查看>>
POI之PPT-元素操纵
查看>>
python 将txt文件转换成excel
查看>>