Add Chroma to Vector Database examples (#262)

* Vector store notebook

* HyDE with Chroma

* Cleaner text

* add swyx edits

* Cleaned up text / outputs

* Spelling nits

* Fixed comment format

---------

Co-authored-by: swyx <shawnthe1@gmail.com>
pull/322/head
Anton Troynikov 1 year ago committed by GitHub
parent 6df6ceff47
commit 1deea48511
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -0,0 +1,300 @@
{"id": 1, "claim": "0-dimensional biomaterials show inductive properties.", "evidence": {}, "cited_doc_ids": [31715818]}
{"id": 3, "claim": "1,000 genomes project enables mapping of genetic sequence variation consisting of rare variants with larger penetrance effects than common variants.", "evidence": {"14717500": [{"sentences": [2, 5], "label": "SUPPORT"}, {"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [14717500]}
{"id": 5, "claim": "1/2000 in UK have abnormal PrP positivity.", "evidence": {"13734012": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [13734012]}
{"id": 13, "claim": "5% of perinatal mortality is due to low birth weight.", "evidence": {}, "cited_doc_ids": [1606628]}
{"id": 36, "claim": "A deficiency of vitamin B12 increases blood levels of homocysteine.", "evidence": {}, "cited_doc_ids": [5152028, 11705328]}
{"id": 42, "claim": "A high microerythrocyte count raises vulnerability to severe anemia in homozygous alpha (+)- thalassemia trait subjects.", "evidence": {"18174210": [{"sentences": [1, 9], "label": "CONTRADICT"}, {"sentences": [10], "label": "CONTRADICT"}]}, "cited_doc_ids": [18174210]}
{"id": 48, "claim": "A total of 1,000 people in the UK are asymptomatic carriers of vCJD infection.", "evidence": {"13734012": [{"sentences": [4], "label": "CONTRADICT"}]}, "cited_doc_ids": [13734012]}
{"id": 49, "claim": "ADAR1 binds to Dicer to cleave pre-miRNA.", "evidence": {"5953485": [{"sentences": [1], "label": "SUPPORT"}, {"sentences": [2], "label": "SUPPORT"}]}, "cited_doc_ids": [5953485]}
{"id": 50, "claim": "AIRE is expressed in some skin tumors.", "evidence": {"12580014": [{"sentences": [1], "label": "SUPPORT"}]}, "cited_doc_ids": [12580014]}
{"id": 51, "claim": "ALDH1 expression is associated with better breast cancer outcomes.", "evidence": {"45638119": [{"sentences": [4], "label": "CONTRADICT"}]}, "cited_doc_ids": [45638119]}
{"id": 53, "claim": "ALDH1 expression is associated with poorer prognosis in breast cancer.", "evidence": {"45638119": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [45638119]}
{"id": 54, "claim": "AMP-activated protein kinase (AMPK) activation increases inflammation-related fibrosis in the lungs.", "evidence": {"49556906": [{"sentences": [5], "label": "CONTRADICT"}, {"sentences": [6], "label": "CONTRADICT"}, {"sentences": [7], "label": "CONTRADICT"}]}, "cited_doc_ids": [49556906]}
{"id": 56, "claim": "APOE4 expression in iPSC-derived neurons increases AlphaBeta production and tau phosphorylation causing GABA neuron degeneration.", "evidence": {"4709641": [{"sentences": [1], "label": "SUPPORT"}]}, "cited_doc_ids": [4709641]}
{"id": 57, "claim": "APOE4 expression in iPSC-derived neurons increases AlphaBeta production and tau phosphorylation, delaying GABA neuron degeneration.", "evidence": {"4709641": [{"sentences": [1], "label": "CONTRADICT"}]}, "cited_doc_ids": [4709641]}
{"id": 70, "claim": "Activation of PPM1D suppresses p53 function.", "evidence": {"5956380": [{"sentences": [5, 6], "label": "SUPPORT"}], "4414547": [{"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [5956380, 4414547]}
{"id": 72, "claim": "Activator-inhibitor pairs are provided dorsally by Admpchordin.", "evidence": {}, "cited_doc_ids": [6076903]}
{"id": 75, "claim": "Active H. pylori urease has a polymeric structure that compromises two subunits, UreA and UreB.", "evidence": {}, "cited_doc_ids": [4387784]}
{"id": 94, "claim": "Albendazole is used to treat lymphatic filariasis.", "evidence": {}, "cited_doc_ids": [1215116]}
{"id": 99, "claim": "Alizarin forms hydrogen bonds with residues involved in PGAM1 substrate binding.", "evidence": {}, "cited_doc_ids": [18810195]}
{"id": 100, "claim": "All hematopoietic stem cells segregate their chromosomes randomly.", "evidence": {"4381486": [{"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [4381486]}
{"id": 113, "claim": "Angiotensin converting enzyme inhibitors are associated with increased risk for functional renal insufficiency.", "evidence": {"6157837": [{"sentences": [2], "label": "SUPPORT"}, {"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [6157837]}
{"id": 115, "claim": "Anthrax spores can be disposed of easily after they are dispersed.", "evidence": {"33872649": [{"sentences": [6], "label": "CONTRADICT"}]}, "cited_doc_ids": [33872649]}
{"id": 118, "claim": "Antibiotic induced alterations in the gut microbiome reduce resistance against Clostridium difficile", "evidence": {"6372244": [{"sentences": [0], "label": "SUPPORT"}, {"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [6372244]}
{"id": 124, "claim": "Antiretroviral therapy reduces rates of tuberculosis across a broad range of CD4 strata.", "evidence": {"4883040": [{"sentences": [8], "label": "SUPPORT"}, {"sentences": [10], "label": "SUPPORT"}]}, "cited_doc_ids": [4883040]}
{"id": 127, "claim": "Arginine 90 in p150n is important for interaction with EB1.", "evidence": {}, "cited_doc_ids": [21598000]}
{"id": 128, "claim": "Arterioles have a larger lumen diameter than venules.", "evidence": {}, "cited_doc_ids": [8290953]}
{"id": 129, "claim": "Articles published in open access format are less likely to be cited than traditional journals.", "evidence": {"27768226": [{"sentences": [2], "label": "CONTRADICT"}, {"sentences": [10], "label": "CONTRADICT"}, {"sentences": [11], "label": "CONTRADICT"}, {"sentences": [40], "label": "CONTRADICT"}]}, "cited_doc_ids": [27768226]}
{"id": 130, "claim": "Articles published in open access format are more likely to be cited than traditional journals.", "evidence": {"27768226": [{"sentences": [2], "label": "SUPPORT"}, {"sentences": [10], "label": "SUPPORT"}, {"sentences": [11], "label": "SUPPORT"}, {"sentences": [40], "label": "SUPPORT"}]}, "cited_doc_ids": [27768226]}
{"id": 132, "claim": "Aspirin inhibits the production of PGE2.", "evidence": {}, "cited_doc_ids": [7975937]}
{"id": 133, "claim": "Assembly of invadopodia is triggered by focal generation of phosphatidylinositol-3,4-biphosphate and the activation of the nonreceptor tyrosine kinase Src.", "evidence": {"16280642": [{"sentences": [3, 4], "label": "SUPPORT"}]}, "cited_doc_ids": [38485364, 6969753, 17934082, 16280642, 12640810]}
{"id": 137, "claim": "Asymptomatic visual impairment screening in elderly populations does not lead to improved vision.", "evidence": {"26016929": [{"sentences": [7, 9], "label": "SUPPORT"}]}, "cited_doc_ids": [26016929]}
{"id": 141, "claim": "Auditory entrainment is strengthened when people see congruent visual and auditory information.", "evidence": {"14437255": [{"sentences": [0], "label": "SUPPORT"}, {"sentences": [1], "label": "SUPPORT"}, {"sentences": [6], "label": "SUPPORT"}, {"sentences": [8], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}, {"sentences": [13], "label": "SUPPORT"}]}, "cited_doc_ids": [6955746, 14437255]}
{"id": 142, "claim": "Autologous transplantation of mesenchymal stem cells causes a higher rate of opportunistic infections than induction therapy with anti-interleukin-2 receptor antibodies.", "evidence": {"10582939": [{"sentences": [12], "label": "CONTRADICT"}]}, "cited_doc_ids": [10582939]}
{"id": 143, "claim": "Autologous transplantation of mesenchymal stem cells causes fewer opportunistic infections than induction therapy with anti-interleukin-2 receptor antibodies.", "evidence": {"10582939": [{"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [10582939]}
{"id": 146, "claim": "Autologous transplantation of mesenchymal stem cells has lower rates of rejection than induction therapy with anti-interleukin-2 receptor antibodies.", "evidence": {"10582939": [{"sentences": [8], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [10582939]}
{"id": 148, "claim": "Autophagy declines in aged organisms.", "evidence": {"1084345": [{"sentences": [1], "label": "SUPPORT"}]}, "cited_doc_ids": [1084345]}
{"id": 163, "claim": "Bariatric surgery has a positive impact on mental health.", "evidence": {"18872233": [{"sentences": [9], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [18872233]}
{"id": 171, "claim": "Basophils counteract disease development in patients with systemic lupus erythematosus (SLE).", "evidence": {"12670680": [{"sentences": [1], "label": "CONTRADICT"}, {"sentences": [2], "label": "CONTRADICT"}, {"sentences": [4], "label": "CONTRADICT"}]}, "cited_doc_ids": [12670680]}
{"id": 179, "claim": "Birth-weight is positively associated with breast cancer.", "evidence": {"16322674": [{"sentences": [5], "label": "SUPPORT"}, {"sentences": [6], "label": "SUPPORT"}, {"sentences": [11], "label": "SUPPORT"}], "27123743": [{"sentences": [3], "label": "SUPPORT"}, {"sentences": [4], "label": "SUPPORT"}], "23557241": [{"sentences": [6], "label": "SUPPORT"}], "17450673": [{"sentences": [5], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}]}, "cited_doc_ids": [16322674, 27123743, 23557241, 17450673]}
{"id": 180, "claim": "Blocking the interaction between TDP-43 and respiratory complex I proteins ND3 and ND6 leads to increased TDP-43-induced neuronal loss.", "evidence": {"16966326": [{"sentences": [5], "label": "CONTRADICT"}]}, "cited_doc_ids": [16966326]}
{"id": 183, "claim": "Bone marrow cells contribute to adult macrophage compartments.", "evidence": {"12827098": [{"sentences": [3], "label": "CONTRADICT"}]}, "cited_doc_ids": [12827098]}
{"id": 185, "claim": "Breast cancer development is determined exclusively by genetic factors.", "evidence": {"18340282": [{"sentences": [2], "label": "CONTRADICT"}, {"sentences": [6], "label": "CONTRADICT"}]}, "cited_doc_ids": [18340282]}
{"id": 198, "claim": "CCL19 is absent within dLNs.", "evidence": {}, "cited_doc_ids": [2177022]}
{"id": 208, "claim": "CHEK2 is not associated with breast cancer.", "evidence": {"13519661": [{"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [13519661]}
{"id": 212, "claim": "CR is associated with higher methylation age.", "evidence": {"22038539": [{"sentences": [1], "label": "CONTRADICT"}, {"sentences": [3], "label": "CONTRADICT"}, {"sentences": [4], "label": "CONTRADICT"}, {"sentences": [7], "label": "CONTRADICT"}, {"sentences": [8], "label": "CONTRADICT"}]}, "cited_doc_ids": [22038539]}
{"id": 213, "claim": "CRP is not predictive of postoperative mortality following Coronary Artery Bypass Graft (CABG) surgery.", "evidence": {}, "cited_doc_ids": [13625993]}
{"id": 216, "claim": "CX3CR1 on the Th2 cells impairs T cell survival", "evidence": {"21366394": [{"sentences": [4], "label": "CONTRADICT"}]}, "cited_doc_ids": [21366394]}
{"id": 217, "claim": "CX3CR1 on the Th2 cells promotes T cell survival", "evidence": {"21366394": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [21366394]}
{"id": 218, "claim": "CX3CR1 on the Th2 cells promotes airway inflammation.", "evidence": {"21366394": [{"sentences": [2], "label": "SUPPORT"}, {"sentences": [3], "label": "SUPPORT"}, {"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [21366394]}
{"id": 219, "claim": "CX3CR1 on the Th2 cells suppresses airway inflammation.", "evidence": {"21366394": [{"sentences": [2], "label": "CONTRADICT"}, {"sentences": [3], "label": "CONTRADICT"}]}, "cited_doc_ids": [21366394]}
{"id": 230, "claim": "Carriers of the alcohol aldehyde dehydrogenase deficiency mutation drink less that non-carries.", "evidence": {"3067015": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [3067015]}
{"id": 232, "claim": "Cataract and trachoma are the primary cause of blindness in Southern Sudan.", "evidence": {}, "cited_doc_ids": [10536636]}
{"id": 233, "claim": "Cell autonomous sex determination in somatic cells does not occur in Galliformes.", "evidence": {"4388470": [{"sentences": [3], "label": "CONTRADICT"}, {"sentences": [6], "label": "CONTRADICT"}, {"sentences": [7], "label": "CONTRADICT"}]}, "cited_doc_ids": [4388470]}
{"id": 236, "claim": "Cell autonomous sex determination in somatic cells occurs in Passeriformes.", "evidence": {"4388470": [{"sentences": [3], "label": "SUPPORT"}, {"sentences": [6], "label": "SUPPORT"}, {"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [4388470]}
{"id": 237, "claim": "Cells lacking clpC have a defect in sporulation efficiency in Bacillus subtilis.", "evidence": {}, "cited_doc_ids": [4942718]}
{"id": 238, "claim": "Cells undergoing methionine restriction may activate miRNAs.", "evidence": {}, "cited_doc_ids": [2251426]}
{"id": 239, "claim": "Cellular aging closely links to an older appearance.", "evidence": {"14079881": [{"sentences": [10], "label": "SUPPORT"}, {"sentences": [11], "label": "SUPPORT"}]}, "cited_doc_ids": [14079881]}
{"id": 248, "claim": "Chenodeosycholic acid treatment increases whole-body energy expenditure.", "evidence": {"1568684": [{"sentences": [1, 3], "label": "SUPPORT"}, {"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [1568684]}
{"id": 249, "claim": "Chenodeosycholic acid treatment reduces whole-body energy expenditure.", "evidence": {"1568684": [{"sentences": [1, 3], "label": "CONTRADICT"}, {"sentences": [4], "label": "CONTRADICT"}]}, "cited_doc_ids": [1568684]}
{"id": 261, "claim": "Chronic aerobic exercise alters endothelial function, improving vasodilating mechanisms mediated by NO.", "evidence": {"1122279": [{"sentences": [5, 10], "label": "SUPPORT"}], "10697096": [{"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [1122279, 10697096]}
{"id": 268, "claim": "Cold exposure increases BAT recruitment.", "evidence": {}, "cited_doc_ids": [970012]}
{"id": 269, "claim": "Cold exposure reduces BAT recruitment.", "evidence": {}, "cited_doc_ids": [970012]}
{"id": 274, "claim": "Combination nicotine replacement therapies with varenicline or bupropion lead to significantly higher long-term abstinence rates at 52 weeks than varenicline monotherapy.", "evidence": {"11614737": [{"sentences": [10], "label": "CONTRADICT"}, {"sentences": [13], "label": "CONTRADICT"}]}, "cited_doc_ids": [11614737]}
{"id": 275, "claim": "Combining phosphatidylinositide 3-kinase and MEK 1/2 inhibitors is effective at treating KRAS mutant tumors.", "evidence": {"4961038": [{"sentences": [7], "label": "SUPPORT"}, {"sentences": [8], "label": "SUPPORT"}], "14241418": [{"sentences": [10], "label": "SUPPORT"}]}, "cited_doc_ids": [4961038, 14241418, 14819804]}
{"id": 279, "claim": "Commelina yellow mottle virus' (ComYMV) genome consists of 7489 baise pairs.", "evidence": {"14376683": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [14376683]}
{"id": 294, "claim": "Crossover hot spots are not found within gene promoters in Saccharomyces cerevisiae.", "evidence": {}, "cited_doc_ids": [10874408]}
{"id": 295, "claim": "Crosstalk between dendritic cells (DCs) and innate lymphoid cells (ILCs) is important in the regulation of intestinal homeostasis.", "evidence": {"20310709": [{"sentences": [2, 4], "label": "SUPPORT"}, {"sentences": [5, 6], "label": "SUPPORT"}]}, "cited_doc_ids": [20310709]}
{"id": 298, "claim": "Cytochrome c is released from the mitochondrial intermembrane space to cytosol during apoptosis.", "evidence": {"39381118": [{"sentences": [0], "label": "SUPPORT"}]}, "cited_doc_ids": [39381118]}
{"id": 300, "claim": "Cytosolic proteins bind to iron-responsive elements on mRNAs coding for DMT1. Cytosolic proteins bind to iron-responsive elements on mRNAs coding for proteins involved in iron uptake.", "evidence": {}, "cited_doc_ids": [3553087]}
{"id": 303, "claim": "DMRT1 is a sex-determining gene that is epigenetically regulated by the MHM region.", "evidence": {}, "cited_doc_ids": [4388470]}
{"id": 312, "claim": "De novo assembly of sequence data has more specific contigs than unassembled sequence data.", "evidence": {}, "cited_doc_ids": [6173523]}
{"id": 314, "claim": "Deamination of cytidine to uridine on the minus strand of viral DNA results in catastrophic G-to-A mutations in the viral genome.", "evidence": {"4347374": [{"sentences": [3, 5], "label": "SUPPORT"}]}, "cited_doc_ids": [4347374]}
{"id": 324, "claim": "Deleting Raptor reduces G-CSF levels.", "evidence": {}, "cited_doc_ids": [2014909]}
{"id": 327, "claim": "Deletion of \u03b1v\u03b28 does not result in a spontaneous inflammatory phenotype.", "evidence": {"17997584": [{"sentences": [2], "label": "SUPPORT"}]}, "cited_doc_ids": [17997584]}
{"id": 338, "claim": "Dexamethasone decreases risk of postoperative bleeding.", "evidence": {"23349986": [{"sentences": [10, 11], "label": "CONTRADICT"}, {"sentences": [12], "label": "CONTRADICT"}, {"sentences": [15], "label": "CONTRADICT"}]}, "cited_doc_ids": [23349986]}
{"id": 343, "claim": "Diabetic patients with acute coronary syndrome experience increased short-term and long-term risk for bleeding events.", "evidence": {"5884524": [{"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [7873737, 5884524]}
{"id": 350, "claim": "Discrimination between the initiator and elongation tRNAs depends on the translation initiation factor IF3.", "evidence": {}, "cited_doc_ids": [16927286]}
{"id": 354, "claim": "Downregulation and mislocalization of Scribble prevents cell transformation and mammary tumorigenesis.", "evidence": {}, "cited_doc_ids": [8774475]}
{"id": 362, "claim": "During the primary early antibody response activated B cells migrate toward the inner-and outer paracortical areas where oxysterol accumulation is generated by stromal cells.", "evidence": {}, "cited_doc_ids": [38587347]}
{"id": 380, "claim": "Enhanced early production of inflammatory chemokines improves viral control in the lung.", "evidence": {"19005293": [{"sentences": [2], "label": "SUPPORT"}]}, "cited_doc_ids": [19005293]}
{"id": 384, "claim": "Epidemiological disease burden from noncommunicable diseases is more prevalent in low economic settings.", "evidence": {}, "cited_doc_ids": [13770184]}
{"id": 385, "claim": "Epigenetic modulating agents (EMAs) modulate antitumor immune response in a cancer model system.", "evidence": {"9955779": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [9955779, 9767444]}
{"id": 386, "claim": "Errors in peripheral IV drug administration are most common during bolus administration and multiple-step medicine preparations.", "evidence": {"16495649": [{"sentences": [8], "label": "SUPPORT"}, {"sentences": [11], "label": "SUPPORT"}]}, "cited_doc_ids": [16495649]}
{"id": 388, "claim": "Ethanol stress decreases the expression of IBP in bacteria.", "evidence": {}, "cited_doc_ids": [1148122]}
{"id": 399, "claim": "Exposure to fine particulate air pollution is relate to anxiety prevalence.", "evidence": {"791050": [{"sentences": [7], "label": "SUPPORT"}, {"sentences": [11], "label": "SUPPORT"}]}, "cited_doc_ids": [791050]}
{"id": 410, "claim": "Febrile seizures increase the threshold for development of epilepsy.", "evidence": {}, "cited_doc_ids": [14924526]}
{"id": 411, "claim": "Febrile seizures reduce the threshold for development of epilepsy.", "evidence": {}, "cited_doc_ids": [14924526]}
{"id": 415, "claim": "Female carriers of the Apolipoprotein E4 (APOE4) allele have increased risk for dementia.", "evidence": {"6309659": [{"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [6309659]}
{"id": 421, "claim": "Flexible molecules experience greater steric hindrance in the tumor microenviroment than rigid molecules.", "evidence": {}, "cited_doc_ids": [11172205]}
{"id": 431, "claim": "FoxO3a activation in neuronal death is mediated by reactive oxygen species (ROS).", "evidence": {"28937856": [{"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [28937856]}
{"id": 436, "claim": "Free histones are degraded by a Rad53-dependent mechanism once DNA has been replicated.", "evidence": {"14637235": [{"sentences": [1], "label": "SUPPORT"}, {"sentences": [2], "label": "SUPPORT"}]}, "cited_doc_ids": [14637235]}
{"id": 437, "claim": "Functional consequences of genomic alterations due to Myelodysplastic syndrome (MDS) are poorly understood due to the lack of an animal model.", "evidence": {}, "cited_doc_ids": [18399038]}
{"id": 439, "claim": "Fz/PCP-dependent Pk localizes to the anterior membrane of neuroectoderm cells during zebrafish neuralation", "evidence": {}, "cited_doc_ids": [4423559]}
{"id": 440, "claim": "Fz/PCP-dependent Pk localizes to the anterior membrane of notochord cells during zebrafish neuralation.", "evidence": {}, "cited_doc_ids": [4423559]}
{"id": 443, "claim": "GATA-3 is important for hematopoietic stem cell (HSC) function.", "evidence": {"10165258": [{"sentences": [4, 5, 6], "label": "SUPPORT"}]}, "cited_doc_ids": [10165258]}
{"id": 452, "claim": "Gene expression does not vary appreciably across genetically identical cells.", "evidence": {"12804937": [{"sentences": [0], "label": "CONTRADICT"}], "464511": [{"sentences": [0], "label": "CONTRADICT"}]}, "cited_doc_ids": [12804937, 464511]}
{"id": 475, "claim": "Glycolysis is one of the primary glycometabolic pathways in cells.", "evidence": {"18678095": [{"sentences": [1], "label": "SUPPORT"}]}, "cited_doc_ids": [18678095]}
{"id": 478, "claim": "Golli-deficient T-cells prefer to differentiate into an anergic phenotype in the adaptive immune response when there are increased levels of Ca2+ in the cytosol.", "evidence": {}, "cited_doc_ids": [14767844]}
{"id": 491, "claim": "HNF4A mutations can cause diabetes in mutant carriers by the age of 14 years", "evidence": {}, "cited_doc_ids": [56893404]}
{"id": 501, "claim": "Headaches are not correlated with cognitive impairment.", "evidence": {"17930286": [{"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [17930286]}
{"id": 502, "claim": "Healthcare delivery efficiency in crowded delivery centers is impaired by improving structural, logistical, and interpersonal elements.", "evidence": {}, "cited_doc_ids": [13071728]}
{"id": 507, "claim": "Helminths interfere with immune system control of macrophages activated by IL-4 favor Mycobacterium tuberculosis replication.", "evidence": {}, "cited_doc_ids": [30774694]}
{"id": 508, "claim": "Hematopoietic Stem Cell purification reaches purity rate of up to 50%.", "evidence": {}, "cited_doc_ids": [13980338]}
{"id": 513, "claim": "High cardiopulmonary fitness causes increased mortality rate.", "evidence": {"13230773": [{"sentences": [11], "label": "CONTRADICT"}]}, "cited_doc_ids": [13230773]}
{"id": 514, "claim": "High dietary calcium intakes are unnecessary for prevention of secondary hyperparathyroidism in subjects with 25(OH)D levels above 75 nmol/liter.", "evidence": {}, "cited_doc_ids": [16256507]}
{"id": 516, "claim": "High levels of CRP reduces the risk of exacerbations in chronic obstructive pulmonary disease (COPD).", "evidence": {"29564505": [{"sentences": [8], "label": "CONTRADICT"}, {"sentences": [13], "label": "CONTRADICT"}]}, "cited_doc_ids": [29564505]}
{"id": 517, "claim": "High levels of copeptin decrease risk of diabetes.", "evidence": {}, "cited_doc_ids": [15663829]}
{"id": 521, "claim": "High-sensitivity cardiac troponin T (HSCT-T) dosage may not be diagnostic if the onset of symptoms occurs less than 3 hours before acute myocardial injury (AMI).", "evidence": {"34873974": [{"sentences": [14], "label": "SUPPORT"}]}, "cited_doc_ids": [34873974]}
{"id": 525, "claim": "Histone demethylase recruitment and a transient decrease in histone methylation is necessary for ligand-dependent induction of transcription by nuclear receptors.", "evidence": {"13639330": [{"sentences": [2], "label": "SUPPORT"}]}, "cited_doc_ids": [13639330]}
{"id": 527, "claim": "Homozygous deletion of murine Sbds gene from osterix-expressing mesenchymal stem and progenitor cells (MPCs) prevents oxidative stress.", "evidence": {}, "cited_doc_ids": [3863543]}
{"id": 528, "claim": "Human T-lymphotropic virus type-I-associated myelopathy / tropical spastic paraparesis (HAM/TSP) patients produce Immunoglobulin G (IgG) antibodies which cross-react with an immunodominant epitope in Tax.", "evidence": {"5476778": [{"sentences": [10], "label": "SUPPORT"}]}, "cited_doc_ids": [5476778]}
{"id": 532, "claim": "Hyperfibrinogenemia decreases rates of femoropopliteal bypass thrombosis.", "evidence": {"12991445": [{"sentences": [5], "label": "CONTRADICT"}, {"sentences": [9], "label": "CONTRADICT"}, {"sentences": [11], "label": "CONTRADICT"}, {"sentences": [12], "label": "CONTRADICT"}]}, "cited_doc_ids": [12991445]}
{"id": 533, "claim": "Hyperfibrinogenemia increases rates of femoropopliteal bypass thrombosis.", "evidence": {"12991445": [{"sentences": [5], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}, {"sentences": [11], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [12991445]}
{"id": 535, "claim": "Hypertension is frequently observed in type 1 diabetes patients.", "evidence": {}, "cited_doc_ids": [39368721]}
{"id": 536, "claim": "Hypocretin neurones induce panicprone state in rats.", "evidence": {"16056514": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [16056514]}
{"id": 539, "claim": "Hypoglycemia increases the risk of dementia.", "evidence": {"13282296": [{"sentences": [7], "label": "SUPPORT"}, {"sentences": [11], "label": "SUPPORT"}]}, "cited_doc_ids": [13282296]}
{"id": 540, "claim": "Hypothalamic glutamate neurotransmission is crucial to energy balance.", "evidence": {"11886686": [{"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [11886686, 25007443]}
{"id": 544, "claim": "IFIT1 restricts viral replication by sequestrating mis-capped viral RNAs.", "evidence": {}, "cited_doc_ids": [24221369]}
{"id": 549, "claim": "IRG1 has antiviral effects against neurotropic viruses.", "evidence": {"9433958": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [9433958]}
{"id": 551, "claim": "ITAM phosphorylation prevents the transfer of the T cell receptor (TCR) signal from the echo-domain to the cytoplasmic tail of the T cell receptor (TCR).", "evidence": {}, "cited_doc_ids": [33499189]}
{"id": 552, "claim": "IgA plasma cells that are specific for transglutaminase 2 accumulate in the duodenal mucosa on commencement of a gluten-free diet.", "evidence": {}, "cited_doc_ids": [1471041]}
{"id": 554, "claim": "Immune complex triggered cell death leads to extracellular release of neutrophil protein HMGB1.", "evidence": {}, "cited_doc_ids": [1049501]}
{"id": 560, "claim": "Immune responses result in the development of inflammatory Th17 cells and anti-inflammatory iTregs.", "evidence": {}, "cited_doc_ids": [40096222]}
{"id": 569, "claim": "In adult tissue, most T cells are memory T cells.", "evidence": {"23460562": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [23460562]}
{"id": 575, "claim": "In domesticated populations of Saccharomyces cerevisiae, whole chromosome aneuploidy is very uncommon.", "evidence": {}, "cited_doc_ids": [10300888]}
{"id": 577, "claim": "In mice, P. chabaudi parasites are able to proliferate faster early in infection when inoculated at lower numbers than when inoculated at high numbers.", "evidence": {}, "cited_doc_ids": [5289038]}
{"id": 578, "claim": "In mouse models, the loss of CSF1R facilitates MOZ-TIF2-induced leuekmogenesis.", "evidence": {"8764879": [{"sentences": [5], "label": "CONTRADICT"}, {"sentences": [7], "label": "CONTRADICT"}, {"sentences": [8], "label": "CONTRADICT"}]}, "cited_doc_ids": [8764879]}
{"id": 587, "claim": "In transgenic mice harboring green florescent protein under the control of the Sox2 promoter, less than ten percent of the cells with green florescent colocalize with cell proliferation markers.", "evidence": {}, "cited_doc_ids": [16999023]}
{"id": 589, "claim": "In young and middle-aged adults, current or remote uses of ADHD medications do not increase the risk of serious cardiovascular events.", "evidence": {"10984005": [{"sentences": [9, 10], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [10984005]}
{"id": 593, "claim": "Incidence of heart failure decreased by 10% in women since 1979.", "evidence": {"19675911": [{"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [19675911]}
{"id": 597, "claim": "Incidence rates of cervical cancer have decreased.", "evidence": {"12779444": [{"sentences": [0], "label": "SUPPORT"}], "36355784": [{"sentences": [6, 7], "label": "SUPPORT"}], "25742130": [{"sentences": [6], "label": "SUPPORT"}]}, "cited_doc_ids": [12779444, 36355784, 25742130]}
{"id": 598, "claim": "Incidence rates of cervical cancer have increased due to nationwide screening programs based primarily on cytology to detect uterine cervical cancer.", "evidence": {"25742130": [{"sentences": [6], "label": "CONTRADICT"}, {"sentences": [9], "label": "CONTRADICT"}]}, "cited_doc_ids": [25742130]}
{"id": 613, "claim": "Increased microtubule acetylation repairs LRRK2 Roc-COR domain mutation induced locomotor deficits.", "evidence": {"9638032": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [9638032]}
{"id": 619, "claim": "Increased vessel density along with a reduction in fibrosis decreases the efficacy of chemotherapy treatments.", "evidence": {"20888849": [{"sentences": [2, 3], "label": "CONTRADICT"}]}, "cited_doc_ids": [20888849, 2565138]}
{"id": 623, "claim": "Individuals with low serum vitamin D concentrations have increased risk of multiple sclerosis.", "evidence": {}, "cited_doc_ids": [17000834]}
{"id": 628, "claim": "Infection of human T-cell lymphotropic virus type 1 is most frequent in individuals of African origin.", "evidence": {}, "cited_doc_ids": [24512064]}
{"id": 636, "claim": "Inositol lipid 3-phosphatase PTEN converts Ptdlns(3,4)P 2 into phosphatidylinositol 4-phosphate.", "evidence": {"24294572": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [24294572]}
{"id": 637, "claim": "Input from mental and physical health care professionals is effective at decreasing homelessness.", "evidence": {}, "cited_doc_ids": [25649714]}
{"id": 641, "claim": "Insomnia can be effectively treated with cognitive behavioral therapy.", "evidence": {"5912283": [{"sentences": [7], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}, {"sentences": [10], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [5912283, 31554917]}
{"id": 644, "claim": "Insulin increases risk of severe kidney failure.", "evidence": {}, "cited_doc_ids": [13619127]}
{"id": 649, "claim": "Integrating classroom-based collaborative learning with Web-based collaborative learning leads to subpar class performance", "evidence": {}, "cited_doc_ids": [12789595]}
{"id": 659, "claim": "Ivermectin is used to treat lymphatic filariasis.", "evidence": {}, "cited_doc_ids": [1215116]}
{"id": 660, "claim": "Ivermectin is used to treat onchocerciasis.", "evidence": {}, "cited_doc_ids": [1215116]}
{"id": 674, "claim": "LDL cholesterol has no involvement in the development of cardiovascular disease.", "evidence": {"2095573": [{"sentences": [0], "label": "CONTRADICT"}]}, "cited_doc_ids": [2095573]}
{"id": 684, "claim": "Lack of clpC does not affect sporulation efficiency in Bacillus subtilis cells.", "evidence": {}, "cited_doc_ids": [4942718]}
{"id": 690, "claim": "Less than 10% of the gabonese children with Schimmelpenning-Feuerstein-Mims syndrome (SFM) had a plasma lactate of more than 5mmol/L.", "evidence": {}, "cited_doc_ids": [18750453]}
{"id": 691, "claim": "Leukemia associated Rho guanine nucleotide-exchange factor represses RhoA in response to SRC activation.", "evidence": {}, "cited_doc_ids": [10991183]}
{"id": 692, "claim": "Leuko-increased blood increases infectious complications in red blood cell transfusion.", "evidence": {"24088502": [{"sentences": [8], "label": "CONTRADICT"}, {"sentences": [9], "label": "CONTRADICT"}, {"sentences": [10], "label": "CONTRADICT"}]}, "cited_doc_ids": [24088502]}
{"id": 693, "claim": "Leuko-reduced blood reduces infectious complications in red blood cell transfusion.", "evidence": {"24088502": [{"sentences": [8], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}, {"sentences": [10], "label": "SUPPORT"}]}, "cited_doc_ids": [24088502]}
{"id": 700, "claim": "Localization of PIN1 in the Arabidopsis embryo does not require VPS9a", "evidence": {}, "cited_doc_ids": [4350400]}
{"id": 702, "claim": "Localization of PIN1 in the roots of Arabidopsis does not require VPS9a", "evidence": {}, "cited_doc_ids": [4350400]}
{"id": 715, "claim": "Low expression of miR7a does represses target genes and exerts a biological function in ovaries.", "evidence": {}, "cited_doc_ids": [18421962]}
{"id": 716, "claim": "Low expression of miR7a exerts a biological function in testis.", "evidence": {}, "cited_doc_ids": [18421962]}
{"id": 718, "claim": "Low nucleosome occupancy correlates with low methylation levels across species.", "evidence": {"17587795": [{"sentences": [4], "label": "CONTRADICT"}]}, "cited_doc_ids": [17587795]}
{"id": 721, "claim": "Lupus-prone mice infected with curliproducing bacteria have higher autoantibody titers compared to controls.", "evidence": {"1834762": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [1834762]}
{"id": 723, "claim": "Ly49Q directs the organization of neutrophil migration to inflammation sites by regulating membrane raft functions.", "evidence": {"5531479": [{"sentences": [1], "label": "SUPPORT"}, {"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [5531479]}
{"id": 727, "claim": "Ly6C hi monocytes have a lower inflammatory capacity compared to their Ly6C lo counterparts.", "evidence": {}, "cited_doc_ids": [7521113]}
{"id": 728, "claim": "Ly6C hi monocytes have a lower inflammatory capacity than Ly6C lo monocytes.", "evidence": {"36444198": [{"sentences": [9], "label": "CONTRADICT"}, {"sentences": [10], "label": "CONTRADICT"}]}, "cited_doc_ids": [7521113, 36444198]}
{"id": 729, "claim": "Lymphadenopathy is observed in knockin mouse lacking the SHP-2 MAPK pathway.", "evidence": {"26851674": [{"sentences": [1], "label": "SUPPORT"}]}, "cited_doc_ids": [26851674]}
{"id": 742, "claim": "Macrolides have no protective effect against myocardial infarction.", "evidence": {"32159283": [{"sentences": [8], "label": "SUPPORT"}]}, "cited_doc_ids": [32159283]}
{"id": 743, "claim": "Macrolides protect against myocardial infarction.", "evidence": {"32159283": [{"sentences": [8], "label": "CONTRADICT"}]}, "cited_doc_ids": [32159283]}
{"id": 744, "claim": "Macropinocytosis contributes to a cell's supply of amino acids via the intracellular uptake of protein.", "evidence": {"8460275": [{"sentences": [2], "label": "SUPPORT"}]}, "cited_doc_ids": [8460275]}
{"id": 756, "claim": "Many proteins in human cells can be post-translationally modified at lysine residues via acetylation.", "evidence": {"2831620": [{"sentences": [0], "label": "SUPPORT"}]}, "cited_doc_ids": [2831620]}
{"id": 759, "claim": "Mathematical models predict that using Artemisinin-based combination therapy over nongametocytocidal drugs have a dramatic impact in reducing malaria transmission.", "evidence": {"1805641": [{"sentences": [13], "label": "CONTRADICT"}]}, "cited_doc_ids": [1805641]}
{"id": 768, "claim": "Mercaptopurine is anabolized into the inactive methylmercaptopurine by thiopurine methyltrasnferase (TPMT).", "evidence": {}, "cited_doc_ids": [6421792]}
{"id": 770, "claim": "Metastatic colorectal cancer treated with a single agent fluoropyrimidines resulted in reduced efficacy and lower quality of life when compared with oxaliplatin-based chemotherapy in elderly patients.", "evidence": {"15476777": [{"sentences": [16], "label": "SUPPORT"}]}, "cited_doc_ids": [15476777]}
{"id": 775, "claim": "Mice defective for deoxyribonucleic acid (DNA) polymerase I (polI) reveal increased sensitivity to ionizing radiation (IR).", "evidence": {}, "cited_doc_ids": [32275758]}
{"id": 781, "claim": "Mice that lack Interferon-\u03b3 or its receptor exhibit high resistance to experimental autoimmune myocarditis.", "evidence": {"24338780": [{"sentences": [2], "label": "CONTRADICT"}, {"sentences": [3], "label": "CONTRADICT"}, {"sentences": [6], "label": "CONTRADICT"}]}, "cited_doc_ids": [24338780]}
{"id": 783, "claim": "Mice without IFN-\u03b3 or its receptor are resistant to EAM induced with \u03b1-MyHC/CFA.", "evidence": {}, "cited_doc_ids": [40632104]}
{"id": 784, "claim": "MicroRNA is involved in the regulation of Neural Stem Cell (NSC) differentiation and proliferation dynamic homeostasis", "evidence": {"2356950": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [2356950]}
{"id": 785, "claim": "Microarray results from culture-amplified mixtures of serotypes correlate poorly with microarray results from uncultured mixtures.", "evidence": {}, "cited_doc_ids": [12471115]}
{"id": 793, "claim": "Mitochondria are uninvolved in apoptosis.", "evidence": {"8551160": [{"sentences": [1], "label": "CONTRADICT"}]}, "cited_doc_ids": [8551160]}
{"id": 800, "claim": "Modifying the epigenome in the brain affects the normal human aging process by affecting certain genes related to neurogenesis.", "evidence": {}, "cited_doc_ids": [22543403]}
{"id": 805, "claim": "Monoclonal antibody targeting of N-cadherin inhibits metastasis.", "evidence": {"22180793": [{"sentences": [4], "label": "SUPPORT"}, {"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [22180793]}
{"id": 808, "claim": "Most termination events in Okazaki fragments are sequence specific.", "evidence": {"36606083": [{"sentences": [1], "label": "SUPPORT"}]}, "cited_doc_ids": [36606083]}
{"id": 811, "claim": "Mutant mice lacking SVCT2 have greatly increased ascorbic acid levels in both brain and adrenals.", "evidence": {"19799455": [{"sentences": [4], "label": "CONTRADICT"}, {"sentences": [8], "label": "CONTRADICT"}]}, "cited_doc_ids": [19799455]}
{"id": 814, "claim": "Mutations in G-Beta protein GNB2 are present in many cancers, resulting in loss of interaction with G-alpha subunits and concomitant activation of AKT pathway.", "evidence": {}, "cited_doc_ids": [33387953]}
{"id": 820, "claim": "N-terminal cleavage increases success identifying transcription start sites.", "evidence": {}, "cited_doc_ids": [8646760]}
{"id": 821, "claim": "N-terminal cleavage reduces success identifying transcription start sites.", "evidence": {}, "cited_doc_ids": [8646760]}
{"id": 823, "claim": "N348I mutations cause resistance to zidovudine (AZT).", "evidence": {"15319019": [{"sentences": [14], "label": "SUPPORT"}]}, "cited_doc_ids": [15319019]}
{"id": 830, "claim": "NF2 (Merlin) causes phosphorylation and subsequent cytoplasmic sequestration of YAP in Drosophila by activating LATS1/2 kinases.", "evidence": {}, "cited_doc_ids": [1897324]}
{"id": 831, "claim": "NF2 (Merlin) prevents phosphorylation and subsequent cytoplasmic sequestration of YAP in Drosophila.", "evidence": {}, "cited_doc_ids": [1897324]}
{"id": 832, "claim": "NFAT4 activation requires IP3R-mediated Ca2+ mobilization.", "evidence": {"30303335": [{"sentences": [2], "label": "SUPPORT"}, {"sentences": [3], "label": "SUPPORT"}, {"sentences": [6, 7], "label": "SUPPORT"}]}, "cited_doc_ids": [30303335]}
{"id": 834, "claim": "NOX2-independent pathways can generate peroxynitrite by reacting with nitrogen intermediates.", "evidence": {}, "cited_doc_ids": [5483793]}
{"id": 837, "claim": "NR5A2 is important in development of endometrial tissues.", "evidence": {"15928989": [{"sentences": [7], "label": "SUPPORT"}, {"sentences": [8], "label": "SUPPORT"}]}, "cited_doc_ids": [15928989]}
{"id": 839, "claim": "Nanoparticles can be targeted against specific cell types by incorporating aptamers into lipid nanoparticles.", "evidence": {"1469751": [{"sentences": [2], "label": "SUPPORT"}, {"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [1469751]}
{"id": 845, "claim": "Neutrophil extracellular traps (NETs) are released by ANCA-stimulated neutrophils.", "evidence": {"17741440": [{"sentences": [1], "label": "SUPPORT"}]}, "cited_doc_ids": [17741440]}
{"id": 847, "claim": "New drugs for tuberculosis often do not penetrate the necrotic portion of a tuberculosis lesion in high concentrations.", "evidence": {"16787954": [{"sentences": [2], "label": "CONTRADICT"}]}, "cited_doc_ids": [16787954]}
{"id": 852, "claim": "Non-invasive ventilation use should be decreased if there is inadequate response to conventional treatment.", "evidence": {"13843341": [{"sentences": [9], "label": "CONTRADICT"}, {"sentences": [10], "label": "CONTRADICT"}]}, "cited_doc_ids": [13843341]}
{"id": 859, "claim": "Normal expression of RUNX1 has tumor-promoting effects.", "evidence": {"1982286": [{"sentences": [3], "label": "CONTRADICT"}]}, "cited_doc_ids": [1982286]}
{"id": 870, "claim": "Obesity decreases life quality.", "evidence": {}, "cited_doc_ids": [195689316]}
{"id": 873, "claim": "Obesity is determined solely by environmental factors.", "evidence": {"1180972": [{"sentences": [3, 4], "label": "CONTRADICT"}, {"sentences": [6], "label": "CONTRADICT"}, {"sentences": [7], "label": "CONTRADICT"}], "19307912": [{"sentences": [3], "label": "CONTRADICT"}, {"sentences": [7], "label": "CONTRADICT"}], "27393799": [{"sentences": [8], "label": "CONTRADICT"}], "29025270": [{"sentences": [1], "label": "CONTRADICT"}, {"sentences": [3], "label": "CONTRADICT"}, {"sentences": [5], "label": "CONTRADICT"}], "3315558": [{"sentences": [4], "label": "CONTRADICT"}, {"sentences": [5], "label": "CONTRADICT"}]}, "cited_doc_ids": [1180972, 19307912, 27393799, 29025270, 3315558]}
{"id": 879, "claim": "Occupancy of ribosomes by IncRNAs do not make functional peptides.", "evidence": {"8426046": [{"sentences": [3], "label": "CONTRADICT"}]}, "cited_doc_ids": [8426046]}
{"id": 880, "claim": "Occupancy of ribosomes by IncRNAs mirror 5 0-UTRs", "evidence": {"8426046": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [8426046]}
{"id": 882, "claim": "Omnivores produce less trimethylamine N-oxide from dietary I-carnitine than vegetarians.", "evidence": {"14803797": [{"sentences": [2], "label": "CONTRADICT"}]}, "cited_doc_ids": [14803797]}
{"id": 887, "claim": "Only a minority of cells survive development after differentiation into stress-resistant spores.", "evidence": {}, "cited_doc_ids": [18855191]}
{"id": 903, "claim": "PD-1 triggering on monocytes reduces IL-10 production by monocytes.", "evidence": {"10648422": [{"sentences": [4], "label": "CONTRADICT"}, {"sentences": [5], "label": "CONTRADICT"}]}, "cited_doc_ids": [10648422]}
{"id": 904, "claim": "PDPN promotes efficient motility along stromal surfaces by activating the C-type lectin receptor to rearrange the actin cytoskeleton in dendritic cells.", "evidence": {"7370282": [{"sentences": [6], "label": "SUPPORT"}]}, "cited_doc_ids": [7370282]}
{"id": 907, "claim": "PGE 2 promotes intestinal tumor growth by altering the expression of tumor suppressing and DNA repair genes.", "evidence": {}, "cited_doc_ids": [6923961]}
{"id": 911, "claim": "PKG-la plays an essential role in expression of pain hypersensitivity in PGK-la knockout mice.", "evidence": {"11254556": [{"sentences": [3], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}]}, "cited_doc_ids": [11254556]}
{"id": 913, "claim": "PPAR-RXRs are inhibited by PPAR ligands.", "evidence": {}, "cited_doc_ids": [3203590]}
{"id": 914, "claim": "PPAR-RXRs can be activated by PPAR ligands.", "evidence": {}, "cited_doc_ids": [3203590]}
{"id": 921, "claim": "Participating in six months of physical activity improves cognitive functioning.", "evidence": {"1642727": [{"sentences": [9], "label": "SUPPORT"}, {"sentences": [10], "label": "SUPPORT"}, {"sentences": [13], "label": "SUPPORT"}]}, "cited_doc_ids": [1642727]}
{"id": 922, "claim": "Patients in stable partnerships have a faster progression from HIV to AIDS.", "evidence": {"17077004": [{"sentences": [7], "label": "CONTRADICT"}, {"sentences": [9], "label": "CONTRADICT"}]}, "cited_doc_ids": [17077004]}
{"id": 936, "claim": "Peroxynitrite is required for nitration of TCR/CD8.", "evidence": {"5483793": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [5483793]}
{"id": 956, "claim": "Pleiotropic coupling of GLP-1R to intracellular effectors promotes distinct profiles of cellular signaling.", "evidence": {}, "cited_doc_ids": [12956194]}
{"id": 957, "claim": "Podocytes are motile and migrate in the presence of injury.", "evidence": {"123859": [{"sentences": [2], "label": "SUPPORT"}]}, "cited_doc_ids": [123859]}
{"id": 960, "claim": "Polymeal nutrition reduces cardiovascular mortality.", "evidence": {"8780599": [{"sentences": [6], "label": "SUPPORT"}, {"sentences": [8], "label": "SUPPORT"}]}, "cited_doc_ids": [8780599]}
{"id": 967, "claim": "Pretreatment with the Arp2/3 inhibitor CK-666 affects lamelliopodia formation.", "evidence": {"8997410": [{"sentences": [2], "label": "SUPPORT"}]}, "cited_doc_ids": [2119889, 8997410]}
{"id": 971, "claim": "Primary cervical cancer screening with HPV detection has higher longitudinal sensitivity than conventional cytology to detect cervical intraepithelial neoplasia grade 2.", "evidence": {"46695481": [{"sentences": [0], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}], "27873158": [{"sentences": [0], "label": "SUPPORT"}, {"sentences": [15], "label": "SUPPORT"}, {"sentences": [21], "label": "SUPPORT"}], "28617573": [{"sentences": [8], "label": "SUPPORT"}], "9764256": [{"sentences": [13], "label": "SUPPORT"}]}, "cited_doc_ids": [46695481, 27873158, 28617573, 9764256]}
{"id": 975, "claim": "Primary pro-inflammatory cytokines induce secondary pro- and anti-inflammatory mediators.", "evidence": {}, "cited_doc_ids": [5304891]}
{"id": 982, "claim": "Proteins synthesized at the growth cone are ubiquitinated at a higher rate than proteins from the cell body.", "evidence": {"2988714": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [2988714]}
{"id": 985, "claim": "Pseudogene PTENP1 regulates the expression of PTEN by functioning as an miRNA decoy.", "evidence": {"6828370": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [6828370]}
{"id": 993, "claim": "Pyridostatin destabilizes the G - quadruplex in the telomeric region.", "evidence": {"16472469": [{"sentences": [4], "label": "CONTRADICT"}]}, "cited_doc_ids": [16472469]}
{"id": 1012, "claim": "Radioiodine treatment of non-toxic multinodular goitre reduces thyroid volume.", "evidence": {"9745001": [{"sentences": [6], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [9745001]}
{"id": 1014, "claim": "Rapamycin decreases the concentration of triacylglycerols in fruit flies.", "evidence": {}, "cited_doc_ids": [6277638]}
{"id": 1019, "claim": "Rapid phosphotransfer rates govern fidelity in two component systems", "evidence": {"11603066": [{"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [11603066]}
{"id": 1020, "claim": "Rapid up-regulation and higher basal expression of interferon-induced genes increase survival of granule cell neurons that are infected by West Nile virus.", "evidence": {"9433958": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [9433958]}
{"id": 1021, "claim": "Rapid up-regulation and higher basal expression of interferon-induced genes reduce survival of granule cell neurons that are infected by West Nile virus.", "evidence": {"9433958": [{"sentences": [3], "label": "CONTRADICT"}]}, "cited_doc_ids": [9433958]}
{"id": 1024, "claim": "Recurrent mutations occur frequently within CTCF anchor sites adjacent to oncogenes.", "evidence": {"5373138": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [5373138]}
{"id": 1029, "claim": "Reduced responsiveness to interleukin-2 in regulatory T cells is associated with greater resistance to autoimmune diseases such as Type 1 Diabetes.", "evidence": {"13923140": [{"sentences": [2, 3], "label": "CONTRADICT"}], "13940200": [{"sentences": [3], "label": "CONTRADICT"}], "11899391": [{"sentences": [6], "label": "CONTRADICT"}]}, "cited_doc_ids": [13923140, 13940200, 11899391]}
{"id": 1041, "claim": "Replacement of histone H2A with H2A.Z slows gene activation in yeasts by stabilizing +1 nucleosomes.", "evidence": {"25254425": [{"sentences": [3], "label": "CONTRADICT"}]}, "cited_doc_ids": [25254425, 16626264]}
{"id": 1049, "claim": "Ribosomopathies have a low degree of cell and tissue specific pathology.", "evidence": {"12486491": [{"sentences": [5, 6], "label": "CONTRADICT"}]}, "cited_doc_ids": [12486491]}
{"id": 1062, "claim": "S-nitrosylated GAPDH physiologically transnitrosylates histone deacetylases.", "evidence": {"20381484": [{"sentences": [10], "label": "SUPPORT"}]}, "cited_doc_ids": [20381484]}
{"id": 1086, "claim": "Sildenafil improves erectile function in men who experience sexual dysfunction as a result of the use of SSRI antidepressants.", "evidence": {"39281140": [{"sentences": [6], "label": "SUPPORT"}, {"sentences": [7], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}]}, "cited_doc_ids": [39281140]}
{"id": 1088, "claim": "Silencing of Bcl2 is important for the maintenance and progression of tumors.", "evidence": {"37549932": [{"sentences": [3], "label": "CONTRADICT"}]}, "cited_doc_ids": [37549932]}
{"id": 1089, "claim": "Smc5/6 engagment drives the activation of SUMO E3 ligase Mms21 by ATP-dependent remolding.", "evidence": {"17628888": [{"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [17628888]}
{"id": 1099, "claim": "Statins decrease blood cholesterol.", "evidence": {}, "cited_doc_ids": [7662206]}
{"id": 1100, "claim": "Statins increase blood cholesterol.", "evidence": {}, "cited_doc_ids": [7662206]}
{"id": 1104, "claim": "Stroke patients with prior use of direct oral anticoagulants have a lower risk of in-hospital mortality than stroke patients with prior use of warfarin.", "evidence": {"3898784": [{"sentences": [8], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}, {"sentences": [10], "label": "SUPPORT"}, {"sentences": [13], "label": "SUPPORT"}]}, "cited_doc_ids": [3898784]}
{"id": 1107, "claim": "Subcutaneous fat depots undergo extensive browning processes after cold exposure.", "evidence": {"20532591": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [20532591]}
{"id": 1110, "claim": "Suboptimal nutrition is not predictive of chronic disease", "evidence": {"13770184": [{"sentences": [13], "label": "CONTRADICT"}]}, "cited_doc_ids": [13770184]}
{"id": 1121, "claim": "Synaptic activity enhances local release of brain derived neurotrophic factor from postsynaptic dendrites.", "evidence": {"4456756": [{"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [4456756]}
{"id": 1130, "claim": "T regulatory cells (tTregs) lacking \u03b1v\u03b28 are more adept at suppressing pathogenic T-cell responses during active inflammation.", "evidence": {"17997584": [{"sentences": [3], "label": "CONTRADICT"}]}, "cited_doc_ids": [17997584]}
{"id": 1132, "claim": "TCR/CD3 microdomains are a required to induce the immunologic synapse to activate T cells.", "evidence": {"33499189": [{"sentences": [3, 4], "label": "SUPPORT"}]}, "cited_doc_ids": [33499189, 9283422]}
{"id": 1137, "claim": "TNFAIP3 is a tumor suppressor in glioblastoma.", "evidence": {"33370": [{"sentences": [6], "label": "CONTRADICT"}, {"sentences": [9], "label": "CONTRADICT"}]}, "cited_doc_ids": [33370]}
{"id": 1140, "claim": "Taking 400mg of \u03b1-tocopheryl acetate helps to prevent prostate cancer.", "evidence": {"12009265": [{"sentences": [9], "label": "CONTRADICT"}, {"sentences": [14], "label": "CONTRADICT"}]}, "cited_doc_ids": [12009265]}
{"id": 1144, "claim": "Taxation of sugar-sweetened beverages had no effect on the incidence rate of type II diabetes in India.", "evidence": {"10071552": [{"sentences": [4], "label": "CONTRADICT"}, {"sentences": [5], "label": "CONTRADICT"}, {"sentences": [8], "label": "CONTRADICT"}]}, "cited_doc_ids": [10071552]}
{"id": 1146, "claim": "Teaching hospitals do not provide better care than non-teaching hospitals.", "evidence": {"13906581": [{"sentences": [6], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}, {"sentences": [13], "label": "SUPPORT"}]}, "cited_doc_ids": [13906581]}
{"id": 1150, "claim": "Tetraspanin-3 is a causative factor in the development of acute myelogenous leukemia", "evidence": {"11369420": [{"sentences": [4], "label": "SUPPORT"}, {"sentences": [5], "label": "SUPPORT"}, {"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [11369420]}
{"id": 1163, "claim": "The DdrB protein from Deinococcus radiodurans is an alternative SSB.", "evidence": {"15305881": [{"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [15305881]}
{"id": 1175, "claim": "The PPR MDA5 has two N-terminal CARD domains.", "evidence": {}, "cited_doc_ids": [31272411]}
{"id": 1179, "claim": "The PRR MDA5 has a central DExD/H RNA helices domain.", "evidence": {}, "cited_doc_ids": [31272411]}
{"id": 1180, "claim": "The PRR MDA5 is a sensor of RNA virus infection.", "evidence": {"31272411": [{"sentences": [0], "label": "SUPPORT"}]}, "cited_doc_ids": [31272411]}
{"id": 1185, "claim": "The US health care system can save up to $750 million if 7% of patients waiting for kidney transplants participate in the optimized national kidney paired donation program.", "evidence": {"16737210": [{"sentences": [10], "label": "SUPPORT"}]}, "cited_doc_ids": [16737210]}
{"id": 1187, "claim": "The YAP1 and TEAD complex tanslocates into the nucleus where it interacts with transcription factors and DNA-binding proteins that modulate target gene transcription.", "evidence": {"52873726": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [52873726]}
{"id": 1191, "claim": "The amount of publicly available DNA data doubles every 10 years.", "evidence": {}, "cited_doc_ids": [30655442]}
{"id": 1194, "claim": "The arm density of TatAd complexes is due to structural rearrangements within Class1 TatAd complexes such as the 'charge zipper mechanism'.", "evidence": {}, "cited_doc_ids": [11419230]}
{"id": 1196, "claim": "The availability of safe places to study is effective at decreasing homelessness.", "evidence": {}, "cited_doc_ids": [25649714]}
{"id": 1197, "claim": "The availability of safe places to study is not effective at decreasing homelessness.", "evidence": {}, "cited_doc_ids": [25649714]}
{"id": 1199, "claim": "The benefits of colchicine were achieved with effective widespread use of secondary prevention strategies such as high-dose statins.", "evidence": {}, "cited_doc_ids": [16760369]}
{"id": 1200, "claim": "The binding orientation of the ML-SA1 activator at hTRPML2 is different from the binding orientation of the ML-SA1 activator at hTRPML1.", "evidence": {}, "cited_doc_ids": [3441524]}
{"id": 1202, "claim": "The center of the granuloma in an immune cell induces a pro-inflammatory immune response.", "evidence": {"3475317": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [3475317]}
{"id": 1204, "claim": "The combination of H3K4me3 and H3K79me2 is found in quiescent hair follicle stem cells.", "evidence": {}, "cited_doc_ids": [31141365]}
{"id": 1207, "claim": "The composition of myosin-II isoform switches from the polarizable B isoform to the more homogenous A isoform during hematopoietic differentiation.", "evidence": {}, "cited_doc_ids": [18909530]}
{"id": 1213, "claim": "The deregulated and prolonged activation of monocytes has deleterious effects in inflammatory diseases.", "evidence": {}, "cited_doc_ids": [14407673]}
{"id": 1216, "claim": "The extracellular domain of TMEM27 is cleaved in human beta cells.", "evidence": {"24142891": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [24142891]}
{"id": 1221, "claim": "The genomic aberrations found in matasteses are very similar to those found in the primary tumor.", "evidence": {"19736671": [{"sentences": [2], "label": "CONTRADICT"}]}, "cited_doc_ids": [19736671]}
{"id": 1225, "claim": "The locus rs647161 is associated with colorectal carcinoma.", "evidence": {"9650982": [{"sentences": [3], "label": "SUPPORT"}]}, "cited_doc_ids": [9650982]}
{"id": 1226, "claim": "The loss of the TET protein functions may have dire biological consequences, such as myeloid cancers.", "evidence": {}, "cited_doc_ids": [13777138]}
{"id": 1232, "claim": "The minor G allele of FOXO3 is related to more severe symptoms of Crohn's Disease.", "evidence": {"13905670": [{"sentences": [3], "label": "CONTRADICT"}]}, "cited_doc_ids": [13905670]}
{"id": 1241, "claim": "The myocardial lineage develops from cardiac progenitors of mesodermal origin.", "evidence": {"4427392": [{"sentences": [0, 3, 4], "label": "SUPPORT"}]}, "cited_doc_ids": [4427392]}
{"id": 1245, "claim": "The one-child policy has been successful in lowering population growth.", "evidence": {}, "cited_doc_ids": [7662395, 7662395]}
{"id": 1259, "claim": "The relationship between a breast cancer patient's capacity to metabolize tamoxifen and treatment outcome is dependent on the patient's genetic make-up.", "evidence": {"24341590": [{"sentences": [11], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}, {"sentences": [13], "label": "SUPPORT"}]}, "cited_doc_ids": [24341590]}
{"id": 1262, "claim": "The repair of Cas9-induced double strand breaks in human DNA is error-prone.", "evidence": {"44172171": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [44172171]}
{"id": 1266, "claim": "The risk of breast cancer among parous women increases with placental weight of pregnancies, and this association is strongest for premenopausal breast cancer.", "evidence": {"37480103": [{"sentences": [10], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [37480103]}
{"id": 1270, "claim": "The risk of male prisoners harming themselves is ten times that of female prisoners.", "evidence": {"13900610": [{"sentences": [7], "label": "CONTRADICT"}, {"sentences": [8], "label": "CONTRADICT"}]}, "cited_doc_ids": [13900610]}
{"id": 1271, "claim": "The severity of cardiac involvement in amyloidosis can be described by the degree of transmurality of late gadolinium enhancement in MRI.", "evidence": {"13768432": [{"sentences": [7], "label": "SUPPORT"}, {"sentences": [11], "label": "SUPPORT"}]}, "cited_doc_ids": [13768432]}
{"id": 1272, "claim": "The single flash-evoked ERG b-wave is generated by activity of ON-bipolar cells.", "evidence": {}, "cited_doc_ids": [17081238]}
{"id": 1273, "claim": "The sliding activity of kinesin-8 protein Kip3 promotes bipolar spindle assembly.", "evidence": {"11041152": [{"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [11041152]}
{"id": 1274, "claim": "The tip of the inner tube of the toxic type VI secretion system (T6SS) antibacterial effector in Escherichia coli (E. coli) carries toxic effector proteins.", "evidence": {"4406819": [{"sentences": [4, 5], "label": "SUPPORT"}, {"sentences": [7], "label": "SUPPORT"}]}, "cited_doc_ids": [12428814, 27731651, 4406819]}
{"id": 1278, "claim": "The treatment of cancer patients with co-IR blockade does not cause any adverse autoimmune events.", "evidence": {}, "cited_doc_ids": [11335781]}
{"id": 1279, "claim": "The treatment of cancer patients with co-IR blockade precipitates adverse autoimmune events.", "evidence": {}, "cited_doc_ids": [11335781]}
{"id": 1280, "claim": "The ureABIEFGH gene cluster encodes urease maturation proteins : UreD/UreH, UreE, UreF, and UreG.", "evidence": {}, "cited_doc_ids": [4387784]}
{"id": 1281, "claim": "The ureABIEFGH gene cluster is induced by nickel (II) ion.", "evidence": {}, "cited_doc_ids": [4387784]}
{"id": 1282, "claim": "Therapeutic use of the drug Dapsone to treat pyoderma gangrenous is based on anecdotal evidence.", "evidence": {}, "cited_doc_ids": [23649163]}
{"id": 1290, "claim": "There is an inverse relationship between hip fractures and statin use.", "evidence": {"4687948": [{"sentences": [7], "label": "SUPPORT"}, {"sentences": [9], "label": "SUPPORT"}, {"sentences": [10], "label": "SUPPORT"}, {"sentences": [13], "label": "SUPPORT"}]}, "cited_doc_ids": [4687948]}
{"id": 1292, "claim": "There is no association between HNF4A mutations and diabetes risks.", "evidence": {}, "cited_doc_ids": [56893404]}
{"id": 1298, "claim": "Thigh-length graduated compression stockings (GCS) did not reduce deep vein thrombosis in patients admitted to hospital who are immobile because of acute stroke.", "evidence": {"11718220": [{"sentences": [11], "label": "SUPPORT"}, {"sentences": [13], "label": "SUPPORT"}]}, "cited_doc_ids": [11718220]}
{"id": 1303, "claim": "Tirasemtiv has no effect on fast-twitch muscle.", "evidence": {"12631697": [{"sentences": [1], "label": "CONTRADICT"}, {"sentences": [2], "label": "CONTRADICT"}]}, "cited_doc_ids": [12631697]}
{"id": 1316, "claim": "Transferred UCB T cells acquire a memory-like phenotype in recipients.", "evidence": {}, "cited_doc_ids": [27910499]}
{"id": 1319, "claim": "Transplanted human glial cells can differentiate within the host animal.", "evidence": {"16284655": [{"sentences": [2], "label": "SUPPORT"}]}, "cited_doc_ids": [16284655]}
{"id": 1320, "claim": "Transplanted human glial progenitor cells are incapable of forming a neural network with host animals' neurons.", "evidence": {"16284655": [{"sentences": [4], "label": "CONTRADICT"}]}, "cited_doc_ids": [16284655]}
{"id": 1332, "claim": "Tumor necrosis factor alpha (TNF-\u03b1) and interleukin-1 (IL-1) are pro-inflammatory cytokines that inhibit IL-6 and IL-10.", "evidence": {}, "cited_doc_ids": [5304891]}
{"id": 1335, "claim": "UCB T cells maintain high TCR diversity after transplantation.", "evidence": {"27910499": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [27910499]}
{"id": 1336, "claim": "UCB T cells reduce TCR diversity after transplantation.", "evidence": {"27910499": [{"sentences": [4], "label": "CONTRADICT"}]}, "cited_doc_ids": [27910499]}
{"id": 1337, "claim": "Ubiquitin ligase UBC13 generates a K63-linked polyubiquitin moiety at PCNA K164.", "evidence": {"20231138": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [20231138]}
{"id": 1339, "claim": "Ultrasound guidance significantly raises the number of traumatic procedures when attempting needle insertion.", "evidence": {"15482274": [{"sentences": [7, 8, 9], "label": "CONTRADICT"}, {"sentences": [10], "label": "CONTRADICT"}]}, "cited_doc_ids": [15482274]}
{"id": 1344, "claim": "Up-regulation of the p53 pathway and related molecular events casues cancer resistance and results in a significantly shortened lifespan marked by senescent cells and accelerated organismal aging.", "evidence": {}, "cited_doc_ids": [9559146]}
{"id": 1352, "claim": "Upregulation of mosGCTL-1 is induced upon infection with West Nile virus.", "evidence": {"12885341": [{"sentences": [3], "label": "SUPPORT"}, {"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [12885341]}
{"id": 1359, "claim": "Varenicline monotherapy is more effective after 12 weeks of treatment compared to combination nicotine replacement therapies with varenicline or bupropion.", "evidence": {"11614737": [{"sentences": [8], "label": "CONTRADICT"}]}, "cited_doc_ids": [11614737]}
{"id": 1362, "claim": "Venules have a larger lumen diameter than arterioles.", "evidence": {}, "cited_doc_ids": [8290953]}
{"id": 1363, "claim": "Venules have a thinner or absent smooth layer compared to arterioles.", "evidence": {}, "cited_doc_ids": [8290953]}
{"id": 1368, "claim": "Vitamin D deficiency effects the term of delivery.", "evidence": {"2425364": [{"sentences": [9], "label": "SUPPORT"}, {"sentences": [10], "label": "SUPPORT"}, {"sentences": [11], "label": "SUPPORT"}, {"sentences": [12], "label": "SUPPORT"}]}, "cited_doc_ids": [2425364]}
{"id": 1370, "claim": "Vitamin D deficiency is unrelated to birth weight.", "evidence": {"2425364": [{"sentences": [10], "label": "CONTRADICT"}, {"sentences": [12], "label": "CONTRADICT"}]}, "cited_doc_ids": [2425364]}
{"id": 1379, "claim": "Women with a higher birth weight are more likely to develop breast cancer later in life.", "evidence": {"16322674": [{"sentences": [5], "label": "SUPPORT"}, {"sentences": [6], "label": "SUPPORT"}], "27123743": [{"sentences": [3], "label": "SUPPORT"}, {"sentences": [4], "label": "SUPPORT"}], "23557241": [{"sentences": [6], "label": "SUPPORT"}], "17450673": [{"sentences": [5], "label": "SUPPORT"}]}, "cited_doc_ids": [16322674, 27123743, 23557241, 17450673]}
{"id": 1382, "claim": "aPKCz causes tumour enhancement by affecting glutamine metabolism.", "evidence": {"17755060": [{"sentences": [3], "label": "CONTRADICT"}, {"sentences": [5], "label": "CONTRADICT"}]}, "cited_doc_ids": [17755060]}
{"id": 1385, "claim": "cSMAC formation enhances weak ligand signalling.", "evidence": {"306006": [{"sentences": [4], "label": "SUPPORT"}]}, "cited_doc_ids": [306006]}
{"id": 1389, "claim": "mTORC2 regulates intracellular cysteine levels through xCT inhibition.", "evidence": {"23895668": [{"sentences": [2, 3], "label": "SUPPORT"}]}, "cited_doc_ids": [23895668]}
{"id": 1395, "claim": "p16INK4A accumulation is linked to an abnormal wound response caused by the microinvasive step of advanced Oral Potentially Malignant Lesions (OPMLs).", "evidence": {}, "cited_doc_ids": [17717391]}

File diff suppressed because one or more lines are too long

@ -1,7 +1,6 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "cb1537e6",
"metadata": {},
@ -23,6 +22,10 @@
"The demo flow is:\n",
"- **Setup**: Import packages and set any required variables\n",
"- **Load data**: Load a dataset and embed it using OpenAI embeddings\n",
"- **Chroma**:\n",
" - *Setup*: Here we'll set up the Python client for Chroma. For more details go [here](https://docs.trychroma.com/usage-guide)\n",
" - *Index Data*: We'll create collections with vectors for __titles__ and __content__\n",
" - *Search Data*: We'll run a few searches to confirm it works\n",
"- **Pinecone**\n",
" - *Setup*: Here we'll set up the Python client for Pinecone. For more details go [here](https://docs.pinecone.io/docs/quickstart)\n",
" - *Index Data*: We'll create an index with namespaces for __titles__ and __content__\n",
@ -46,7 +49,7 @@
"- **Typesense**\n",
" - *Setup*: Set up the Typesense Python client. For more details go [here](https://typesense.org/docs/0.24.0/api/)\n",
" - *Index Data*: We'll create a collection and index it for both __titles__ and __content__.\n",
" - *Search Data*: Run a few example queries with various goals in mind.\n",
" - *Search Data*: Run a few example queries with various goals in mind.\n",
"\n",
"\n",
"Once you've run through this notebook you should have a basic understanding of how to setup and use vector databases, and can move on to more complex use cases making use of our embeddings."
@ -70,6 +73,7 @@
"outputs": [],
"source": [
"# We'll need to install the clients for all vector databases\n",
"!pip install chromadb\n",
"!pip install pinecone-client\n",
"!pip install weaviate-client\n",
"!pip install pymilvus\n",
@ -90,7 +94,6 @@
"source": [
"import openai\n",
"\n",
"import tiktoken\n",
"from typing import List, Iterator\n",
"import pandas as pd\n",
"import numpy as np\n",
@ -101,15 +104,15 @@
"# Redis client library for Python\n",
"import redis\n",
"\n",
"# Chroma's client library for Python\n",
"import chromadb\n",
"\n",
"# Pinecone's client library for Python\n",
"import pinecone\n",
"\n",
"# Weaviate's client library for Python\n",
"import weaviate\n",
"\n",
"# Milvus's client library for Python\n",
"import pymilvus\n",
"\n",
"# Qdrant's client library for Python\n",
"import qdrant_client\n",
"\n",
@ -207,6 +210,196 @@
"article_df.info(show_counts=True)"
]
},
{
"cell_type": "markdown",
"id": "81bf5349",
"metadata": {},
"source": [
"# Chroma\n",
"\n",
"We'll index these embedded documents in a vector database and search them. The first option we'll look at is **Chroma**, an easy to use open-source self-hosted in-memory vector database, designed for working with embeddings together with LLMs. \n",
"\n",
"In this section, we will:\n",
"- Instantiate the Chroma client\n",
"- Create collections for each class of embedding \n",
"- Query each collection "
]
},
{
"cell_type": "markdown",
"id": "37d1f693",
"metadata": {},
"source": [
"### Instantiate the Chroma client\n",
"\n",
"Create the Chroma client. By default, Chroma is ephemeral and runs in memory. \n",
"However, you can easily set up a persistent configuraiton which writes to disk."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "159d9646",
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"chroma_client = chromadb.Client() # Ephemeral. Comment out for the persistent version.\n",
"\n",
"# Uncomment the following for the persistent version. \n",
"# import chromadb.config.Settings\n",
"# persist_directory = 'chroma_persistence' # Directory to store persisted Chroma data. \n",
"# client = chromadb.Client(\n",
"# Settings(\n",
"# persist_directory=persist_directory,\n",
"# chroma_db_impl=\"duckdb+parquet\",\n",
"# )\n",
"# )"
]
},
{
"cell_type": "markdown",
"id": "5cd61943",
"metadata": {},
"source": [
"### Create collections\n",
"\n",
"Chroma collections allow you to store and filter with arbitrary metadata, making it easy to query subsets of the embedded data. \n",
"\n",
"Chroma is already integrated with OpenAI's embedding functions. The best way to use them is on construction of a collection, as follows.\n",
"Alternatively, you can 'bring your own embeddings'. More information can be found [here]https://docs.trychroma.com/embeddings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ad2d1bce",
"metadata": {},
"outputs": [],
"source": [
"from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction\n",
"\n",
"# Test that your OpenAI API key is correctly set as an environment variable\n",
"# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.\n",
"\n",
"# Note. alternatively you can set a temporary env variable like this:\n",
"# os.environ[\"OPENAI_API_KEY\"] = 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n",
"\n",
"if os.getenv(\"OPENAI_API_KEY\") is not None:\n",
" openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n",
" print (\"OPENAI_API_KEY is ready\")\n",
"else:\n",
" print (\"OPENAI_API_KEY environment variable not found\")\n",
"\n",
"\n",
"embedding_function = OpenAIEmbeddingFunction(api_key=os.environ.get('OPENAI_API_KEY'), model_name=EMBEDDING_MODEL)\n",
"\n",
"wikipedia_content_collection = chroma_client.create_collection(name='wikipedia_content', embedding_function=embedding_function)\n",
"wikipedia_title_collection = chroma_client.create_collection(name='wikipedia_titles', embedding_function=embedding_function)"
]
},
{
"cell_type": "markdown",
"id": "02887b52",
"metadata": {},
"source": [
"### Populate the collections\n",
"\n",
"Chroma collections allow you to populate, and filter on, whatever metadata you like. Chroma can also store the text alongside the vectors, and return everything in a single `query` call, when this is more convenient. \n",
"\n",
"For this use-case, we'll just store the embeddings and IDs, and use these to index the original dataframe. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84885fec",
"metadata": {},
"outputs": [],
"source": [
"# Add the content vectors\n",
"wikipedia_content_collection.add(\n",
" ids=article_df.vector_id.tolist(),\n",
" embeddings=article_df.content_vector.tolist(),\n",
")\n",
"\n",
"# Add the title vectors\n",
"wikipedia_title_collection.add(\n",
" ids=article_df.vector_id.tolist(),\n",
" embeddings=article_df.title_vector.tolist(),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "79122c6b",
"metadata": {},
"source": [
"### Search the collections\n",
"\n",
"Chroma handles embedding queries for you if an embedding function is set, like in this example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "273b8b4c",
"metadata": {},
"outputs": [],
"source": [
"def query_collection(collection, query, max_results, dataframe):\n",
" results = collection.query(query_texts=query, n_results=max_results, include=['distances']) \n",
" df = pd.DataFrame({\n",
" 'id':results['ids'][0], \n",
" 'score':results['distances'][0],\n",
" 'title': dataframe[dataframe.vector_id.isin(results['ids'][0])]['title'],\n",
" 'content': dataframe[dataframe.vector_id.isin(results['ids'][0])]['text'],\n",
" })\n",
" \n",
" return df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e84cf47f",
"metadata": {},
"outputs": [],
"source": [
"title_query_result = query_collection(\n",
" collection=wikipedia_title_collection,\n",
" query=\"modern art in Europe\",\n",
" max_results=10,\n",
" dataframe=article_df\n",
")\n",
"title_query_result.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f4db910a",
"metadata": {},
"outputs": [],
"source": [
"content_query_result = query_collection(\n",
" collection=wikipedia_content_collection,\n",
" query=\"Famous battles in Scottish history\",\n",
" max_results=10,\n",
" dataframe=article_df\n",
")\n",
"content_query_result.head()"
]
},
{
"cell_type": "markdown",
"id": "a03e7645",
"metadata": {},
"source": [
"Now that you've got a basic embeddings search running, you can [hop over to the Chroma docs](https://docs.trychroma.com/usage-guide#using-where-filters) to learn more about how to add filters to your query, update/delete data in your collections, and deploy Chroma."
]
},
{
"cell_type": "markdown",
"id": "ed32fc87",
@ -214,7 +407,7 @@
"source": [
"## Pinecone\n",
"\n",
"We'll index these embedded documents in a vector database and search them. The first option we'll look at is **Pinecone**, a managed vector database which offers a cloud-native option.\n",
"The next option we'll look at is **Pinecone**, a managed vector database which offers a cloud-native option.\n",
"\n",
"Before you proceed with this step you'll need to navigate to [Pinecone](pinecone.io), sign up and then save your API key as an environment variable titled ```PINECONE_API_KEY```.\n",
"\n",
@ -427,7 +620,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d939342f",
"metadata": {},
@ -458,7 +650,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "bfdfe260",
"metadata": {},
@ -523,7 +714,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "03a926b9",
"metadata": {},
@ -808,7 +998,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4dc3a0c0",
"metadata": {},
@ -825,7 +1014,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "fe4914e9",
"metadata": {},
@ -851,7 +1039,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "64ffed22",
"metadata": {},
@ -952,7 +1139,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1f68a790",
"metadata": {},
@ -1423,7 +1609,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "43bffd04",
"metadata": {},
@ -1478,7 +1663,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "698e24f6",
"metadata": {},
@ -1532,7 +1716,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3f6f0af9",
"metadata": {},
@ -1611,7 +1794,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f3563eec",
"metadata": {},
@ -1656,7 +1838,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f646bff4",
"metadata": {},
@ -1733,7 +1914,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "0ed0b34e",
"metadata": {},
@ -1791,7 +1971,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f94b5be2",
"metadata": {},
@ -1801,6 +1980,9 @@
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Typesense\n",
"\n",
@ -1809,13 +1991,13 @@
"Typesense focuses on performance by storing the entire index in RAM (with a backup on disk) and also focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults. It also lets you combine attribute-based filtering together with vector queries.\n",
"\n",
"For this example, we will set up a local docker-based Typesense server, index our vectors in Typesense and then do some nearest-neighbor search queries. If you use Typesense Cloud, you can skip the docker setup part and just obtain the hostname and API keys from your cluster dashboard."
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Setup\n",
"\n",
@ -1824,14 +2006,14 @@
"After starting Docker, you can start Typesense locally by navigating to the `examples/vector_databases/typesense/` directory and running `docker-compose up -d`.\n",
"\n",
"The default API key is set to `xyz` in the Docker compose file, and the default Typesense port to `8108`."
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import typesense\n",
@ -1846,25 +2028,25 @@
" \"api_key\": \"xyz\",\n",
" \"connection_timeout_seconds\": 60\n",
" })"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Index data\n",
"\n",
"To index vectors in Typesense, we'll first create a Collection (which is a collection of Documents) and turn on vector indexing for a particular field. You can even store multiple vector fields in a single document."
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Delete existing collections if they already exist\n",
@ -1895,14 +2077,14 @@
"print(create_response)\n",
"\n",
"print(\"Created new collection wikipedia-articles\")"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Upsert the vector data into the collection we just created\n",
@ -1939,39 +2121,39 @@
" print(f\"Processed {document_counter} / {len(article_df)} \")\n",
"\n",
"print(f\"Imported ({len(article_df)}) articles.\")"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Check the number of documents imported\n",
"\n",
"collection = typesense_client.collections['wikipedia_articles'].retrieve()\n",
"print(f'Collection has {collection[\"num_documents\"]} documents')"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Search Data\n",
"\n",
"Now that we've imported the vectors into Typesense, we can do a nearest neighbor search on the `title_vector` or `content_vector` field."
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def query_typesense(query, field='title', top_k=20):\n",
@ -1992,14 +2174,14 @@
" }, {})\n",
"\n",
" return typesense_results"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
@ -2025,14 +2207,14 @@
" document = hit[\"document\"]\n",
" vector_distance = hit[\"vector_distance\"]\n",
" print(f'{i + 1}. {document[\"title\"]} (Distance: {vector_distance})')"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
@ -2058,10 +2240,7 @@
" document = hit[\"document\"]\n",
" vector_distance = hit[\"vector_distance\"]\n",
" print(f'{i + 1}. {document[\"title\"]} (Distance: {vector_distance})')"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
@ -2074,7 +2253,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "redisvl2",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -2088,11 +2267,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.16"
},
"vscode": {
"interpreter": {
"hash": "9b1e6e9c2967143209c2f955cb869d1d3234f92dc4787f49f155f3abbdfb1316"
"hash": "fd16a328ca3d68029457069b79cb0b38eb39a0f5ccc4fe4473d3047707df8207"
}
}
},

@ -0,0 +1,988 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Robust Question Answering with Chroma and OpenAI \n",
"\n",
"This notebook guides you step-by-step through answering questions about a collection of data, using [Chroma](https://trychroma.com), an open-source embeddings database, along with OpenAI's [text embeddings](https://platform.openai.com/docs/guides/embeddings/use-cases) and [chat completion](https://platform.openai.com/docs/guides/chat) API's. \n",
"\n",
"Additionally, this notebook demonstrates some of the tradeoffs in making a question answering system more robust. As we shall see, *simple querying doesn't always create the best results*! \n",
"\n",
"## Question Answering with LLMs\n",
"\n",
"Large language models (LLMs) like OpenAI's ChatGPT can be used to answer questions about data that the model may not have been trained on, or have access to. For example;\n",
"\n",
"- Personal data like e-mails and notes\n",
"- Highly specialized data like archival or legal documents\n",
"- Newly created data like recent news stories\n",
"\n",
"In order to overcome this limitation, we can use a data store which is amenable to querying in natural language, just like the LLM itself. An embeddings store like Chroma represents documents as [embeddings](https://openai.com/blog/introducing-text-and-code-embeddings), alongside the documents themselves. \n",
"\n",
"By embedding a text query, Chroma can find relevant documents, which we can then pass to the LLM to answer our question. We'll show detailed examples and variants of this approach. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Setup and preliminaries\n",
"\n",
"First we make sure the python dependencies we need are installed. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU openai chromadb pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use OpenAI's API's throughout this notebook. You can get an API key from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys)\n",
"\n",
"You can add your API key as an environment variable by executing the command `export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx` in a terminal. Note that you will need to reload the notebook if the environment variable wasn't set yet. Alternatively, you can set it in the notebook, see below. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OPENAI_API_KEY is ready\n"
]
}
],
"source": [
"import os\n",
"\n",
"# Uncomment the following line to set the environment variable in the notebook\n",
"# os.environ[\"OPENAI_API_KEY\"] = \"sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\" \n",
"\n",
"if os.getenv(\"OPENAI_API_KEY\") is not None:\n",
" print(\"OPENAI_API_KEY is ready\")\n",
" import openai\n",
" openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"else:\n",
" print(\"OPENAI_API_KEY environment variable not found\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dataset\n",
"\n",
"Throughout this notebook, we use the [SciFact dataset](https://github.com/allenai/scifact). This is a curated dataset of expert annotated scientific claims, with an accompanying text corpus of paper titles and abstracts. Each claim may be supported, contradicted, or not have enough evidence either way, according to the documents in the corpus. \n",
"\n",
"Having the corpus available as ground-truth allows us to investigate how well the following approaches to LLM question answering perform. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>claim</th>\n",
" <th>evidence</th>\n",
" <th>cited_doc_ids</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0-dimensional biomaterials show inductive prop...</td>\n",
" <td>{}</td>\n",
" <td>[31715818]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3</td>\n",
" <td>1,000 genomes project enables mapping of genet...</td>\n",
" <td>{'14717500': [{'sentences': [2, 5], 'label': '...</td>\n",
" <td>[14717500]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>5</td>\n",
" <td>1/2000 in UK have abnormal PrP positivity.</td>\n",
" <td>{'13734012': [{'sentences': [4], 'label': 'SUP...</td>\n",
" <td>[13734012]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>13</td>\n",
" <td>5% of perinatal mortality is due to low birth ...</td>\n",
" <td>{}</td>\n",
" <td>[1606628]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>36</td>\n",
" <td>A deficiency of vitamin B12 increases blood le...</td>\n",
" <td>{}</td>\n",
" <td>[5152028, 11705328]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id claim \\\n",
"0 1 0-dimensional biomaterials show inductive prop... \n",
"1 3 1,000 genomes project enables mapping of genet... \n",
"2 5 1/2000 in UK have abnormal PrP positivity. \n",
"3 13 5% of perinatal mortality is due to low birth ... \n",
"4 36 A deficiency of vitamin B12 increases blood le... \n",
"\n",
" evidence cited_doc_ids \n",
"0 {} [31715818] \n",
"1 {'14717500': [{'sentences': [2, 5], 'label': '... [14717500] \n",
"2 {'13734012': [{'sentences': [4], 'label': 'SUP... [13734012] \n",
"3 {} [1606628] \n",
"4 {} [5152028, 11705328] "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load the claim dataset\n",
"import pandas as pd\n",
"\n",
"data_path = '../../data'\n",
"\n",
"claim_df = pd.read_json(f'{data_path}/scifact_claims.jsonl', lines=True)\n",
"claim_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Just asking the model\n",
"\n",
"GPT-3.5 was trained on a large amount of scientific information. As a baseline, we'd like to understand what the model already knows without any further context. This will allow us to calibrate overall performance. \n",
"\n",
"We construct an appropriate prompt, with some example facts, then query the model with each claim in the dataset. We ask the model to assess a claim as 'True', 'False', or 'NEE' if there is not enough evidence one way or the other. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def build_prompt(claim):\n",
" return [\n",
" {\"role\": \"system\", \"content\": \"I will ask you to assess a scientific claim. Output only the text 'True' if the claim is true, 'False' if the claim is false, or 'NEE' if there's not enough evidence.\"},\n",
" {\"role\": \"user\", \"content\": f\"\"\" \n",
"Example:\n",
"\n",
"Claim:\n",
"0-dimensional biomaterials show inductive properties.\n",
"\n",
"Assessment:\n",
"False\n",
"\n",
"Claim:\n",
"1/2000 in UK have abnormal PrP positivity.\n",
"\n",
"Assessment:\n",
"True\n",
"\n",
"Claim:\n",
"Aspirin inhibits the production of PGE2.\n",
"\n",
"Assessment:\n",
"False\n",
"\n",
"End of examples. Assess the following claim:\n",
"\n",
"Claim:\n",
"{claim}\n",
"\n",
"Assessment:\n",
"\"\"\"}\n",
" ]\n",
"\n",
"\n",
"def assess_claims(claims):\n",
" responses = []\n",
" # Query the OpenAI API\n",
" for claim in claims:\n",
" response = openai.ChatCompletion.create(\n",
" model='gpt-3.5-turbo',\n",
" messages=build_prompt(claim),\n",
" max_tokens=3,\n",
" )\n",
" # Strip any punctuation or whitespace from the response\n",
" responses.append(response.choices[0].message.content.strip('., '))\n",
"\n",
" return responses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We sample 100 claims from the dataset"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"# Let's take a look at 100 claims\n",
"samples = claim_df.sample(50)\n",
"\n",
"claims = samples['claim'].tolist() \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We evaluate the ground-truth according to the dataset. From the dataset description, each claim is either supported or contradicted by the evidence, or else there isn't enough evidence either way. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def get_groundtruth(evidence):\n",
" groundtruth = []\n",
" for e in evidence:\n",
" # Evidence is empty \n",
" if len(e) == 0:\n",
" groundtruth.append('NEE')\n",
" else:\n",
" # In this dataset, all evidence for a given claim is consistent, either SUPPORT or CONTRADICT\n",
" if list(e.values())[0][0]['label'] == 'SUPPORT':\n",
" groundtruth.append('True')\n",
" else:\n",
" groundtruth.append('False')\n",
" return groundtruth"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"evidence = samples['evidence'].tolist()\n",
"groundtruth = get_groundtruth(evidence)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also output the confusion matrix, comparing the model's assessments with the ground truth, in an easy to read table. "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def confusion_matrix(inferred, groundtruth):\n",
" assert len(inferred) == len(groundtruth)\n",
" confusion = {\n",
" 'True': {'True': 0, 'False': 0, 'NEE': 0},\n",
" 'False': {'True': 0, 'False': 0, 'NEE': 0},\n",
" 'NEE': {'True': 0, 'False': 0, 'NEE': 0},\n",
" }\n",
" for i, g in zip(inferred, groundtruth):\n",
" confusion[i][g] += 1\n",
"\n",
" # Pretty print the confusion matrix\n",
" print('\\tGroundtruth')\n",
" print('\\tTrue\\tFalse\\tNEE')\n",
" for i in confusion:\n",
" print(i, end='\\t')\n",
" for g in confusion[i]:\n",
" print(confusion[i][g], end='\\t')\n",
" print()\n",
"\n",
" return confusion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We ask the model to directly assess the claims, without additional context. "
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\tGroundtruth\n",
"\tTrue\tFalse\tNEE\n",
"True\t15\t5\t14\t\n",
"False\t0\t2\t1\t\n",
"NEE\t3\t3\t7\t\n"
]
},
{
"data": {
"text/plain": [
"{'True': {'True': 15, 'False': 5, 'NEE': 14},\n",
" 'False': {'True': 0, 'False': 2, 'NEE': 1},\n",
" 'NEE': {'True': 3, 'False': 3, 'NEE': 7}}"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gpt_inferred = assess_claims(claims)\n",
"confusion_matrix(gpt_inferred, groundtruth)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"\n",
"From these results we see that the LLM is strongly biased to assess claims as true, even when they are false, and also tends to assess false claims as not having enough evidence. Note that 'not enough evidence' is with respect to the model's assessment of the claim in a vacuum, without additional context.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Adding context \n",
"\n",
"We now add the additional context available from the corpus of paper titles and abstracts. This section shows how to load a text corpus into Chroma, using OpenAI text embeddings. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we load the text corpus. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>doc_id</th>\n",
" <th>title</th>\n",
" <th>abstract</th>\n",
" <th>structured</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4983</td>\n",
" <td>Microstructural development of human newborn c...</td>\n",
" <td>[Alterations of the architecture of cerebral w...</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5836</td>\n",
" <td>Induction of myelodysplasia by myeloid-derived...</td>\n",
" <td>[Myelodysplastic syndromes (MDS) are age-depen...</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>7912</td>\n",
" <td>BC1 RNA, the transcript from a master gene for...</td>\n",
" <td>[ID elements are short interspersed elements (...</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>18670</td>\n",
" <td>The DNA Methylome of Human Peripheral Blood Mo...</td>\n",
" <td>[DNA methylation plays an important role in bi...</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>19238</td>\n",
" <td>The human myelin basic protein gene is include...</td>\n",
" <td>[Two human Golli (for gene expressed in the ol...</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" doc_id title \\\n",
"0 4983 Microstructural development of human newborn c... \n",
"1 5836 Induction of myelodysplasia by myeloid-derived... \n",
"2 7912 BC1 RNA, the transcript from a master gene for... \n",
"3 18670 The DNA Methylome of Human Peripheral Blood Mo... \n",
"4 19238 The human myelin basic protein gene is include... \n",
"\n",
" abstract structured \n",
"0 [Alterations of the architecture of cerebral w... False \n",
"1 [Myelodysplastic syndromes (MDS) are age-depen... False \n",
"2 [ID elements are short interspersed elements (... False \n",
"3 [DNA methylation plays an important role in bi... False \n",
"4 [Two human Golli (for gene expressed in the ol... False "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load the corpus into a dataframe\n",
"corpus_df = pd.read_json(f'{data_path}/scifact_corpus.jsonl', lines=True)\n",
"corpus_df.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading the corpus into Chroma\n",
"\n",
"The next step is to load the corpus into Chroma. Given an embedding function, Chroma will automatically handle embedding each document, and will store it alongside its text and metadata, making it simple to query."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"We instantiate a (ephemeral) Chroma client, and create a collection for the SciFact title and abstract corpus. \n",
"Chroma can also be instantiated in a persisted configuration; learn more at the [Chroma docs](https://docs.trychroma.com/usage-guide?lang=py). "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"import chromadb\n",
"from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction\n",
"\n",
"# We initialize an embedding function, and provide it to the collection.\n",
"embedding_function = OpenAIEmbeddingFunction(api_key=os.getenv(\"OPENAI_API_KEY\"))\n",
"\n",
"chroma_client = chromadb.Client()\n",
"scifact_corpus_collection = chroma_client.create_collection(name='scifact_corpus', embedding_function=embedding_function)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we load the corpus into Chroma. Because this data loading is memory intensive, we recommend using a batched loading scheme in batches of 50-1000. For this example it should take just over one minute for the entire corpus. It's being embedded in the background, automatically, using the `embedding_function` we specified earlier."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"batch_size = 100\n",
"\n",
"for i in range(0, len(corpus_df), batch_size):\n",
" batch_df = corpus_df[i:i+batch_size]\n",
" scifact_corpus_collection.add(\n",
" ids=batch_df['doc_id'].apply(lambda x: str(x)).tolist(), # Chroma takes string IDs.\n",
" documents=(batch_df['title'] + '. ' + batch_df['abstract'].apply(lambda x: ' '.join(x))).to_list(), # We concatenate the title and abstract.\n",
" metadatas=[{\"structured\": structured} for structured in batch_df['structured'].to_list()] # We also store the metadata, though we don't use it in this example.\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieving context\n",
"\n",
"Next we retrieve documents from the corpus which may be relevant to each claim in our sample. We want to provide these as context to the LLM for evaluating the claims. We retrieve the 3 most relevant documents for each claim, according to the embedding distance. "
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"claim_query_result = scifact_corpus_collection.query(query_texts=claims, include=['documents', 'distances'], n_results=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We create a new prompt, this time taking into account the additional context we retrieve from the corpus. "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def build_prompt_with_context(claim, context):\n",
" return [{'role': 'system', 'content': \"I will ask you to assess whether a particular scientific claim, based on evidence provided. Output only the text 'True' if the claim is true, 'False' if the claim is false, or 'NEE' if there's not enough evidence.\"}, \n",
" {'role': 'user', 'content': f\"\"\"\"\n",
"The evidence is the following:\n",
"\n",
"{' '.join(context)}\n",
"\n",
"Assess the following claim on the basis of the evidence. Output only the text 'True' if the claim is true, 'False' if the claim is false, or 'NEE' if there's not enough evidence. Do not output any other text. \n",
"\n",
"Claim:\n",
"{claim}\n",
"\n",
"Assessment:\n",
"\"\"\"}]\n",
"\n",
"\n",
"def assess_claims_with_context(claims, contexts):\n",
" responses = []\n",
" # Query the OpenAI API\n",
" for claim, context in zip(claims, contexts):\n",
" # If no evidence is provided, return NEE\n",
" if len(context) == 0:\n",
" responses.append('NEE')\n",
" continue\n",
" response = openai.ChatCompletion.create(\n",
" model='gpt-3.5-turbo',\n",
" messages=build_prompt_with_context(claim=claim, context=context),\n",
" max_tokens=3,\n",
" )\n",
" # Strip any punctuation or whitespace from the response\n",
" responses.append(response.choices[0].message.content.strip('., '))\n",
"\n",
" return responses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then ask the model to evaluate the claims with the retrieved context. "
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\tGroundtruth\n",
"\tTrue\tFalse\tNEE\n",
"True\t16\t2\t8\t\n",
"False\t1\t6\t5\t\n",
"NEE\t1\t2\t9\t\n"
]
},
{
"data": {
"text/plain": [
"{'True': {'True': 16, 'False': 2, 'NEE': 8},\n",
" 'False': {'True': 1, 'False': 6, 'NEE': 5},\n",
" 'NEE': {'True': 1, 'False': 2, 'NEE': 9}}"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gpt_with_context_evaluation = assess_claims_with_context(claims, claim_query_result['documents'])\n",
"confusion_matrix(gpt_with_context_evaluation, groundtruth)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"\n",
"We see that the model is a lot less likely to evaluate a False claim as true (2 instances VS 5 previously), but that claims without enough evidence are still often assessed as True or False.\n",
"\n",
"Taking a look at the retrieved documents, we see that they are sometimes not relevant to the claim - this causes the model to be confused by the extra information, and it may decide that sufficient evidence is present, even when the information is irrelevant. This happens because we always ask for the 3 'most' relevant documents, but these might not be relevant at all beyond a certain point. "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filtering context on relevance\n",
"\n",
"Along with the documents themselves, Chroma returns a distance score. We can try thresholding on distance, so that fewer irrelevant documents make it into the context we provide the model. \n",
"\n",
"If, after filtering on the threshold, no context documents remain, we bypass the model and simply return that there is not enough evidence. "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"def filter_query_result(query_result, distance_threshold=0.25):\n",
"# For each query result, retain only the documents whose distance is below the threshold\n",
" for ids, docs, distances in zip(query_result['ids'], query_result['documents'], query_result['distances']):\n",
" for i in range(len(ids)-1, -1, -1):\n",
" if distances[i] > distance_threshold:\n",
" ids.pop(i)\n",
" docs.pop(i)\n",
" distances.pop(i)\n",
" return query_result\n"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"filtered_claim_query_result = filter_query_result(claim_query_result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we assess the claims using this cleaner context. "
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\tGroundtruth\n",
"\tTrue\tFalse\tNEE\n",
"True\t10\t2\t1\t\n",
"False\t0\t2\t1\t\n",
"NEE\t8\t6\t20\t\n"
]
},
{
"data": {
"text/plain": [
"{'True': {'True': 10, 'False': 2, 'NEE': 1},\n",
" 'False': {'True': 0, 'False': 2, 'NEE': 1},\n",
" 'NEE': {'True': 8, 'False': 6, 'NEE': 20}}"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gpt_with_filtered_context_evaluation = assess_claims_with_context(claims, filtered_claim_query_result['documents'])\n",
"confusion_matrix(gpt_with_filtered_context_evaluation, groundtruth)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"\n",
"The model now assesses many fewer claims as True or False when there is not enough evidence present. However, it now biases away from certainty. Most claims are now assessed as having not enough evidence, because a large fraction of them are filtered out by the distance threshold. It's possible to tune the distance threshold to find the optimal operating point, but this can be difficult, and is dataset and embedding model dependent. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hypothetical Document Embeddings: Using hallucinations productively\n",
"\n",
"We want to be able to retrieve relevant documents, without retrieving less relevant ones which might confuse the model. One way to accomplish this is to improve the retrieval query. \n",
"\n",
"Until now, we have queried the dataset using _claims_ which are single sentence statements, while the corpus contains _abstracts_ describing a scientific paper. Intuitively, while these might be related, there are significant differences in their structure and meaning. These differences are encoded by the embedding model, and so influence the distances between the query and the most relevant results. \n",
"\n",
"We can overcome this by leveraging the power of LLMs to generate relevant text. While the facts might be hallucinated, the content and structure of the documents the models generate is more similar to the documents in our corpus, than the queries are. This could lead to better queries and hence better results. \n",
"\n",
"This approach is called [Hypothetical Document Embeddings (HyDE)](https://arxiv.org/abs/2212.10496), and has been shown to be quite good at the retrieval task. It should help us bring more relevant information into the context, without polluting it. \n",
"\n",
"TL;DR:\n",
"- you get much better matches when you embed whole abstracts rather than single sentences\n",
"- but claims are usually single sentences\n",
"- So HyDE shows that using GPT3 to expand claims into hallucinated abstracts and then searching based on those abstracts works (claims -> abstracts -> results) better than searching directly (claims -> results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we use in-context examples to prompt the model to generate documents similar to what's in the corpus, for each claim we want to assess. "
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"def build_hallucination_prompt(claim):\n",
" return [{'role': 'system', 'content': \"\"\"I will ask you to write an abstract for a scientific paper which supports or refutes a given claim. It should be written in scientific language, include a title. Output only one abstract, then stop.\n",
" \n",
" An Example:\n",
"\n",
" Claim:\n",
" A high microerythrocyte count raises vulnerability to severe anemia in homozygous alpha (+)- thalassemia trait subjects.\n",
"\n",
" Abstract:\n",
" BACKGROUND The heritable haemoglobinopathy alpha(+)-thalassaemia is caused by the reduced synthesis of alpha-globin chains that form part of normal adult haemoglobin (Hb). Individuals homozygous for alpha(+)-thalassaemia have microcytosis and an increased erythrocyte count. Alpha(+)-thalassaemia homozygosity confers considerable protection against severe malaria, including severe malarial anaemia (SMA) (Hb concentration < 50 g/l), but does not influence parasite count. We tested the hypothesis that the erythrocyte indices associated with alpha(+)-thalassaemia homozygosity provide a haematological benefit during acute malaria. \n",
" METHODS AND FINDINGS Data from children living on the north coast of Papua New Guinea who had participated in a case-control study of the protection afforded by alpha(+)-thalassaemia against severe malaria were reanalysed to assess the genotype-specific reduction in erythrocyte count and Hb levels associated with acute malarial disease. We observed a reduction in median erythrocyte count of approximately 1.5 x 10(12)/l in all children with acute falciparum malaria relative to values in community children (p < 0.001). We developed a simple mathematical model of the linear relationship between Hb concentration and erythrocyte count. This model predicted that children homozygous for alpha(+)-thalassaemia lose less Hb than children of normal genotype for a reduction in erythrocyte count of >1.1 x 10(12)/l as a result of the reduced mean cell Hb in homozygous alpha(+)-thalassaemia. In addition, children homozygous for alpha(+)-thalassaemia require a 10% greater reduction in erythrocyte count than children of normal genotype (p = 0.02) for Hb concentration to fall to 50 g/l, the cutoff for SMA. We estimated that the haematological profile in children homozygous for alpha(+)-thalassaemia reduces the risk of SMA during acute malaria compared to children of normal genotype (relative risk 0.52; 95% confidence interval [CI] 0.24-1.12, p = 0.09). \n",
" CONCLUSIONS The increased erythrocyte count and microcytosis in children homozygous for alpha(+)-thalassaemia may contribute substantially to their protection against SMA. A lower concentration of Hb per erythrocyte and a larger population of erythrocytes may be a biologically advantageous strategy against the significant reduction in erythrocyte count that occurs during acute infection with the malaria parasite Plasmodium falciparum. This haematological profile may reduce the risk of anaemia by other Plasmodium species, as well as other causes of anaemia. Other host polymorphisms that induce an increased erythrocyte count and microcytosis may confer a similar advantage.\n",
"\n",
" End of example. \n",
" \n",
" \"\"\"}, {'role': 'user', 'content': f\"\"\"\"\n",
" Perform the task for the following claim.\n",
"\n",
" Claim:\n",
" {claim}\n",
"\n",
" Abstract:\n",
" \"\"\"}]\n",
"\n",
"\n",
"def hallucinate_evidence(claims):\n",
" # Query the OpenAI API\n",
" responses = []\n",
" # Query the OpenAI API\n",
" for claim in claims:\n",
" response = openai.ChatCompletion.create(\n",
" model='gpt-3.5-turbo',\n",
" messages=build_hallucination_prompt(claim),\n",
" )\n",
" responses.append(response.choices[0].message.content)\n",
" return responses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We hallucinate a document for each claim.\n",
"\n",
"*NB: This can take a while, about 30m for 100 claims*. You can reduce the number of claims we want to assess to get results more quickly. "
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"hallucinated_evidence = hallucinate_evidence(claims)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use the hallucinated documents as queries into the corpus, and filter the results using the same distance threshold. "
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"hallucinated_query_result = scifact_corpus_collection.query(query_texts=hallucinated_evidence, include=['documents', 'distances'], n_results=3)\n",
"filtered_hallucinated_query_result = filter_query_result(hallucinated_query_result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We then ask the model to assess the claims, using the new context. "
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\tGroundtruth\n",
"\tTrue\tFalse\tNEE\n",
"True\t15\t2\t5\t\n",
"False\t1\t5\t4\t\n",
"NEE\t2\t3\t13\t\n"
]
},
{
"data": {
"text/plain": [
"{'True': {'True': 15, 'False': 2, 'NEE': 5},\n",
" 'False': {'True': 1, 'False': 5, 'NEE': 4},\n",
" 'NEE': {'True': 2, 'False': 3, 'NEE': 13}}"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gpt_with_hallucinated_context_evaluation = assess_claims_with_context(claims, filtered_hallucinated_query_result['documents'])\n",
"confusion_matrix(gpt_with_hallucinated_context_evaluation, groundtruth)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"\n",
"Combining HyDE with a simple distance threshold leads to a significant improvement. The model no longer biases assessing claims as True, nor toward their not being enough evidence. It also correctly assesses when there isn't enough evidence more often."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusion\n",
"\n",
"Equipping LLMs with a context based on a corpus of documents is a powerful technique for bringing the general reasoning and natural language interactions of LLMs to your own data. However, it's important to know that naive query and retrieval may not produce the best possible results! Ultimately understanding the data will help get the most out of the retrieval based question-answering approach. \n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
},
"vscode": {
"interpreter": {
"hash": "fd16a328ca3d68029457069b79cb0b38eb39a0f5ccc4fe4473d3047707df8207"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading…
Cancel
Save