We now have all parameters in place to calculate the Tanimoto coefficient. For this we will use the $project operator which, next to copying the compound id and smiles property, also adds a new, computed property named tanimoto.{ "$project" : { "_id" : 1 , "tanimoto" : { "$divide" : [ "$fingerprintmatches" , { "$subtract" : [ { "$add" : [ 40 , "$totalcount"] } , "$fingerprintmatches"] } ] } , "smiles" : 1}}复制代码As we are only interested in compounds that have a target Tanimoto coefficient of 0.8, we apply an additional $match operator to filter out all the ones that do not reach this coefficient.{ "$match" :
{ "tanimoto" : { "$gte" : 0.8}
}
复制代码The full pipeline command can be found below.01.{ "aggregate" : "compounds" ,
02."pipeline" : [
03.{ "$match" :
04.{ "fingerprint_count" : { "$gte" : 32 , "$lte" : 50} }
05.},
06.{ "$unwind" : "$fingerprints"},
07.{ "$match" :
08.{ "fingerprints" :
09.{ "$in" : [ 1960 , 15111 , 5186 , 5371 , 756 , 1015 , 1018 , 338 , 325 , 776 , 3900, ... , 2473] }
10.}
11.},
12.{ "$group" :
13.{ "_id" : "$compound_cid" ,
14."fingerprintmatches" : { "$sum" : 1} ,
15."totalcount" : { "$first" : "$fingerprint_count"} ,
16."smiles" : { "$first" : "$smiles"}
17.}
18.},
19.{ "$project" :
20.{ "_id" : 1 ,
21."tanimoto" : { "$divide" : [ "$fingerprintmatches" , { "$subtract": [ { "$add" : [ 89 , "$totalcount"]} , "$fingerprintmatches"] } ] } ,
22."smiles" : 1
23.}
24.},
25.{ "$match" :
26.{ "tanimoto" : { "$gte" : 0.05} }
27.} ]
28.}
复制代码The output of this pipeline contains a list of compounds which have a Tanimoto of 0.8 or higher with respect to a particular target compound. A visual representation of this pipeline can be found below:
pipeline.jpg (35.66 KB, 下载次数: 54)
下载附件
2012-3-8 11:32 上传
|