二进制文档上传
SolrNet支持Solr“提取”功能(a.k.a. Solr“Cell”)从二进制文档格式(如Word,PDF等)索引数据。
这里有一个简单的示例,显示如何从PDF文件中提取文本,而无需对其进行索引:
ISolrOperations<Something> solr = ...
using (var file = File.OpenRead(@"test.pdf")) {
var response = solr.Extract(new ExtractParameters(file, "some_document_id") {
ExtractOnly = true,
ExtractFormat = ExtractFormat.Text,
});
Console.WriteLine(response.Content);
}
ExtractOnly = true告诉Solr只执行文本提取,但不索引上传的文档。 如果ExtractOnly = false,您可以使用Fields属性添加更多字段。 其他选项可以通过ExtractParameters类的属性设置。 通常建议为内容提供StreamType,因为自动检测可能会失败。
有关ExtractParameters中每个选项的更多详细信息,请参阅Solr wiki和Solr参考指南。
使用SolrNet的网站,产品和公司
- http://www.education.gov.uk
- http://www.fancydressoutfitters.co.uk
- http://jobhits.net
- http://jobhits.co.uk
- http://www.leasetransfer.com
- http://www.leasetrader.com
- http://www.bedriftsoket.no
- http://www.watchfinder.co.uk
- http://www.sub.su.se/
- EPiSolr
- CapitalIQ
- http://www.crocus.co.uk
- http://www.waitrosegarden.com
- nopAccelerate (by Xcellence-IT)
- Sitecore
- http://www.libris.no/