自雷

2019-07-22 23:15 武汉职业技术学院 Java

关注

Lucene的基本使用-代码总结

原理图

图片说明

前期准备

创建文件如下 --被查询的内容

图片说明

使用的jar包

图片说明

lucene-core-7.4.0.jar
lucene-analyzers-common-7.4.0.jar
IK-Analyzer-1.0-SNAPSHOT.jar
commons-io-2.6.jar
lucene-queryparser-7.4.0.jar

配置文件

图片说明

hotword.dic --存放热门词汇 配合IK-Analyzer分词
stopword.dic --停用词汇 (敏感词) 不参与查询
IKAnalyzer.cfg.xml --IK-Analyzer配置文件

IK-Analyzer.xml内容如下

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
    <comment>IK Analyzer 扩展配置</comment>
    <!--用户可以在这里配置自己的扩展字典 -->
    <entry key="ext_dict">hotword.dic;</entry>

    <!--用户可以在这里配置自己的扩展停止词字典-->
    <entry key="ext_stopwords">stopword.dic;</entry> 

</properties>

入门Demo

创建索引

第一步：创建一个java工程，并导入jar包。

第二步：创建一个indexwriter对象。

1）指定索引库的存放位置Directory对象
2）指定一个IndexWriterConfig对象。

第二步：创建document对象。

第三步：创建field对象，将field添加到document对象中。

第四步：使用indexwriter对象将document对象写入索引库，此过程进行索引创建。并将索引和document对象写入索引库。

第五步：关闭IndexWriter对象。

@Test
    public void createIndex() throws Exception{
        //1.指定索引库存放的路径
        FSDirectory directory = FSDirectory.open(new File("D:\\lucene\\index").toPath());
        //2.创建indexWriter对象
        IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig());
        //3.读取原始文档的路径
        File dir = new File("D:\\lucene\\searchsource");
        File[] files = dir.listFiles();
        for (File file : files) {
            //文件名
            String fileName = file.getName();
            //文件内容
            String fileContent = FileUtils.readFileToString(file, "utf-8");
            //文件路径
            String filePath = file.getPath();
            //文件大小
            long fileSize = FileUtils.sizeOf(file);

            //创建文件名域
            //第一个参数：域的名称
            //第二个参数：域的内容
            //第三个参数：是否存储
            Field fileNameField = new TextField("fileName",fileName, Field.Store.YES);
            Field fileContentField = new TextField("fileContent",fileContent, Field.Store.YES);
            Field filePathField = new TextField("filePath",filePath, Field.Store.YES);
            Field fileSizeField = new TextField("fileSize",fileSize+"", Field.Store.YES);

            //创建document对象
            Document document = new Document();

            //文档对象添加域
            document.add(fileNameField);
            document.add(fileContentField);
            document.add(filePathField);
            document.add(fileSizeField);

            //把文档写入索引库
            indexWriter.addDocument(document);
        }
        //关闭indexWriter
        indexWriter.close();

    }

查询索引

第一步：创建一个Directory对象，也就是索引库存放的位置。

第二步：创建一个indexReader对象，需要指定Directory对象。

第三步：创建一个indexsearcher对象，需要指定IndexReader对象

第四步：创建一个TermQuery对象，指定查询的域和查询的关键词。

第五步：执行查询

第六步：返回查询结果。遍历查询结果并输出。

第七步：关闭IndexReader对象

 @Test
    public void searchIndex() throws Exception{
        //指定索引库存放的路径
        FSDirectory directory = FSDirectory.open(new File("D:\\lucene\\index").toPath());
        //创建indexReader对象
        DirectoryReader indexReader = DirectoryReader.open(directory);
        //创建indexSearcher对象
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        //创建查询
        Query query = new TermQuery(new Term("fileContent", "spring"));

        //执行查询
        //第一个参数是查询对象,第二个参数是查询结果返回的最大值
        TopDocs topDocs = indexSearcher.search(query, 10);

        //查询结果的总条数
        System.out.println("查询结果的总条数:" + topDocs.totalHits);

        //遍历查询结果啥发送到发送到
        //topDocs.scoreDocs存储了document对象的id
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            //scoreDoc.doc属性接收document对象的id
            //根据document的id找到document对象
            Document doc = indexSearcher.doc(scoreDoc.doc);
            //System.out.println("doc = " + doc.get("fileContent"));
            System.out.println("doc = " + doc.get("fileName"));
            System.out.println("doc = " + doc.get("filePath"));
            System.out.println("doc = " + doc.get("fileSize"));
            System.out.println("-----------------------------");
        }
        //关闭indexReader对象
        indexReader.close();
    }

分析器的使用

这是一个标准分析器

    //标准分析器(分词器)
    @Test
    public void testTokenStream() throws Exception{
        //查看标准分析器的分词效果
        Analyzer analyzer = new StandardAnalyzer();

        //获取tokenStream对象
        //第一个参数:域名,可以随便给一个
        //第二个参数:要分析的文本内容

        TokenStream tokenStream = analyzer.tokenStream("", "The Spring Framework provides a comprehensive programming and configuration model.");

        //添加一个引用,可以获得每个关键词
        CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);

        //将指针调整到列表的头部
        tokenStream.reset();
        while (tokenStream.incrementToken()){
            //其关键词
            System.out.println(charTermAttribute.toString());
        }
        tokenStream.close();
    }

Lucene自带中文分词器

StandardAnalyzer：
单字分词：就是按照中文一个字一个字地进行分词。如：“任天堂是世界主宰”，
效果：“任”、“天”、“堂”、“是”、“世”、“界””、“主”、“宰”。

SmartChineseAnalyzer
对中文支持较好，但扩展性差，扩展词库，禁用词库和同义词库等不好处理。

这时候就用到了IKAnalyzer

使用IKAnalyzer分词自制

 //IKAnalyzer分析器 测试效果(分词器)
    @Test
    public void testIKAnalyzerStream() throws Exception{
        //查看标准分析器的分词效果
        Analyzer analyzer = new IKAnalyzer(); //这里换成IKAnalyzer

        //获取tokenStream对象
        //第一个参数:域名,可以随便给一个
        //第二个参数:要分析的文本内容

        TokenStream tokenStream = analyzer.tokenStream("", "小伙子***你结婚了吗");

        //添加一个引用,可以获得每个关键词
        CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);

        //将指针调整到列表的头部
        tokenStream.reset();
        while (tokenStream.incrementToken()){
            //其关键词
            System.out.println(charTermAttribute.toString());
        }
        tokenStream.close();
    }

索引库的维护

添加

   //添加文档
    @Test
    public void addDocument() throws Exception{
        //索引库存放路径
        FSDirectory directory = FSDirectory.open(new File("D:\\lucene\\index").toPath());
        //创建一个indexWriter对象
        IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(new IKAnalyzer()));
        //创建一个Document对象
        Document document = new Document();
        document.add(new TextField("fileName","新添加的文档",Field.Store.YES));
        document.add(new TextField("fileContent","又新添加的文档的内容",Field.Store.NO));
        document.add(new StoredField("path","D:\\lucene\\hello"));
        indexWriter.addDocument(document);
        indexWriter.close();
    }

删除全部

    private IndexWriter indexWriter;

    @Before
    public void inti() throws Exception{
        //创建一个IndexWriter对象，需要使用IKAnalyzer作为分析器
        FSDirectory directory = FSDirectory.open(new File("D:\\lucene\\index").toPath());
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new IKAnalyzer());

        indexWriter = new IndexWriter(directory, indexWriterConfig);
    }

    //删除所有文档
    @Test
    public void deleteAllDocument() throws Exception{

        indexWriter.deleteAll();
        indexWriter.close();
    }

根据查询条件删除索引

    private IndexWriter indexWriter;

    @Before
    public void inti() throws Exception{
        //创建一个IndexWriter对象，需要使用IKAnalyzer作为分析器
        FSDirectory directory = FSDirectory.open(new File("D:\\lucene\\index").toPath());
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new IKAnalyzer());

        indexWriter = new IndexWriter(directory, indexWriterConfig);
    }

  //根据查询条件删除索引
    @Test
    public void deleteIndexByQuery() throws Exception{

        indexWriter.deleteDocuments(new Term("fileName","apache"));
        indexWriter.close();
    }

替换修改

    private IndexWriter indexWriter;

    @Before
    public void inti() throws Exception{
        //创建一个IndexWriter对象，需要使用IKAnalyzer作为分析器
        FSDirectory directory = FSDirectory.open(new File("D:\\lucene\\index").toPath());
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new IKAnalyzer());

        indexWriter = new IndexWriter(directory, indexWriterConfig);
    }

    //替换查询结果
    @Test
    public void updateDocument() throws Exception{
        //创建一个新的文档对象
        Document document = new Document();
        //向文档对象中添加域
        document.add(new TextField("name1","更新之后的文档1",Field.Store.YES));
        document.add(new TextField("name2","更新之后的文档2",Field.Store.YES));
        document.add(new TextField("name3","更新之后的文档3",Field.Store.YES));
        //更新操作
        indexWriter.updateDocument(new Term("fileName","spring"),document);
        //关闭文档
        indexWriter.close();
    }

查询

先抽取循环展示语句

    @Before
    public void init() throws Exception {
        //指定索引库存放的路径 创建indexReader对象

        indexReader = DirectoryReader.open(FSDirectory.open(new File("D:\\lucene\\index").toPath()));
        //创建indexSearcher对象

        indexSearcher = new IndexSearcher(indexReader);
    }


    //查询的展示语句 抽取出来
    private void printResult(Query query) throws IOException {
        //执行查询
        //第一个参数是查询对象,第二个参数是查询结果返回的最大值
        TopDocs topDocs = indexSearcher.search(query, 10);

        //查询结果的总条数
        System.out.println("查询结果的总条数:" + topDocs.totalHits);

        //遍历查询结果啥发送到发送到
        //topDocs.scoreDocs存储了document对象的id
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            //scoreDoc.doc属性接收document对象的id
            //根据document的id找到document对象
            Document doc = indexSearcher.doc(scoreDoc.doc);
            //System.out.println("doc = " + doc.get("fileContent"));
            System.out.println("doc = " + doc.get("fileName"));
            System.out.println("doc = " + doc.get("filePath"));
            System.out.println("doc = " + doc.get("fileSize"));
            System.out.println("-----------------------------");
        }
    }

TermQuery 关键词查询

    //TermQuery  关键词查询
    @Test
    public void searchIndex() throws Exception {

        //创建查询
        Query query = new TermQuery(new Term("fileName", "spring"));

        //调用自己抽取的查询方法
        printResult(query);

        //关闭indexReader对象
        indexReader.close();
    }

newRangeQuery 范围查询(size)

//newRangeQuery 范围查询(size)
    @Test
    public void testRangeQuery() throws Exception{
        //创建一个query对象
        Query query = LongPoint.newRangeQuery("fileSize",0l,1000l);

        printResult(query);

        //关闭indexReader对象
        indexReader.close();
    }

QueryParser 分词后查询 (先分成词后再用词查询类似模糊查询搜索中比较常用)

@Test
    public void testQueryParser() throws Exception{
        //创建一个QueryParser对象,两个参数
        QueryParser queryParser = new QueryParser("fileName",new IKAnalyzer());

        //参数1:默认搜索域,参数2:分析器对象
        //使用QueryParser对象创建一盒query对象
        Query query = queryParser.parse("lucene是一个Java开发的全文检索工具包");

        //执行查询
        printResult(query);
    }