科研项目知识图谱搭建过程
创建节点实例:Header
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///node_header.csv" AS line CREATE(p:Header{title:line.fzr_xm,fzr_zc:line.fzr_zcfzr_lxdh:line.fzr_lxdh,fzr_dzxx:line.fzr_dzxx,set_id:line.id})
建立项目-学科关系:Subject_of
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///rel_subject.csv" AS line MATCH (entity1:Project{title:line.xmmc}),(entity2:Subject{title:line.xkmc}) CREATE (entity1)-[:Subject_of { type: line.relation }]->(entity2)
建立负责人-机构关系:Work_in
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///rel_workin.csv" AS line MATCH (entity1:Header{title:line.fzr_xm}),(entity2:Institute{title:line.szyx}) CREATE (entity1)-[:Work_in { type: line.relation }]->(entity2)
建立项目 — 关键词关系:Keyword_of
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///rel_keyword_of.csv" AS line MATCH (entity1:Project{title:line.xmmc}),(entity2:Keywords{title:line.ztc3}) CREATE (entity2)-[:Keyword_of { type: line.relation }]->(entity1)
删除指定id的关系节点
Match ()-[r]-() Where ID(r)=876318 Delete r
// 对titile属性添加UNIQUE(唯一约束/索引) // 创建索引 CREATE CONSTRAINT ON (c:Header) ASSERT c.title IS UNIQUE
其中,涉及navicat中重复值处理并将结果导出:
SELECT DISTINCT ztc1 from node_header;
Excel对缺失值和重复值的处理:
ctrl+g:定位到“空值”,删除-“整行”
数据-重复值删除-扩展(删除整行)-选择哪行为判断重复的依据
注意1:navicat中表只显示1000行数据,不可复制到新表,这样只复制1000行,导出才完整。
注意2:字段多的表先全部导出,进行噪声值、重复值和缺失值处理,再根据需要导出各自字段的数据
遗留问题:不同表之间,怎么根据Id 关联实现数据替换