mercury-parser/TODO.md

TODO:
- Test if .is method is faster than regex methods

DONE:
x Separate constants into activity-specific folders (dom, scoring)
x extractNextPageUrl
x Make sure weightNodes flag is being passed properly
x Rename all cleaners from cleanThing to clean
x Remove $ from function calls to getScore
x remove all but attributes whitelist. research what attributes are important beyond SRC and href
x remove logic for fetching meta attrs with custom props
x cleaning embed and object nodes
x run makeLinksAbsolute on extracted content before returning
x add option to fetch attrs in RootExtractor's select method
x get custom datePublished selector to convert to date object (prob through cleaner)
x extract and generalize cleaners
  x move arguments to cleaners to object
x Check that lead-image-url extractor isn't looking for end-of-string file extension matches (i.e., it could be ...foo.jpg?otherstuff
x extractLeadImageUrl
x Resource (fetches page, validates it, cleans it, normalizes meta tags (!), converts lazy-loaded images, makes links absolute, etc)
x extractDek
x extractDatePublished
x Title metadata
x Test re-initializing $ if/when it needs to loop again
x `cleanHeaders` Remove any headers that are before any p tags, matching title, etc
x `extract` (this kicks it all off)
x `node_is_sufficient`
x `_extract_best_node`
x `get_weight`
x `_strip_unlikely_candidates`
x `_convert_to_paragraphs`
x `_brs_to_paragraphs`
x `_paragraphize`

## Scoring

x `_get_score`
x `_set_score`
x `_add_score`
x `_score_content`
x `_score_node`
x `_score_paragraph`

## Top Candidate

x `_find_top_candidate`
x `extract_clean_node`
x `_clean_conditionally`
feat: implemented extractBestNode functionality Squashed commit of the following: commit 9af554dd975ff1778ed70c71fa9bde667fc5f880 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 15:19:32 2016 -0400 feat: add cleanHeaders commit 0dfea98eedc4f97fcbd78866322595c705e20521 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 14:30:49 2016 -0400 fix: scoring parent nodes recursively commit b6e5897a694adeb81e25a905aba72c0f45a8cc94 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 12:47:24 2016 -0400 feat: extract clean node up and running commit fb652c5db13db6bce7271efd68ba4b20515e9549 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 09:57:21 2016 -0400 chore: added test for p tags with nested tags (e.g., img, iframe) commit 731d0a2e4d89121dfafad195e9d0911805c4f8e4 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 17:50:33 2016 -0400 feat: extact clean node integrates most functions commit 322bc6534d30feb7c1c08d3813132badc6286b40 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:46:04 2016 -0400 feat: removing empty nodes as defined in constants commit f1d38932ea12a865814d2326970031fcb8515baa Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:33:31 2016 -0400 feat: cleaning attributes from nodes commit 0aa73ada6854af0ecd504bfe3d926a9524787ab5 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:09:56 2016 -0400 feat: cleaning h1s from text commit 12d4a309246285c278ce7765e4fbaa8271bb5889 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:52:03 2016 -0400 feat: removing spacer images commit 4e74ff830cc67586560f6fc72e2cfa432a3a2647 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:38:49 2016 -0400 feat: stripping unwanted html from doc commit c774166e90169fd0c1aa89898d3f7a975e82bf0a Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:17:32 2016 -0400 feat: removing small images, height attribute from images commit 3a8642f42cda451669c832482c5e1611b1ff2ea9 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 12:57:45 2016 -0400 feat: rewrite top level commit a1c03e779234b0aea02206d92ec3dcc15758507e Author: Adam Pash <adam.pash@gmail.com> Date: Fri Aug 26 17:34:36 2016 -0400 in a weird place rn 2016-08-30 19:25:25 +00:00			`TODO:`
			`- Test if .is method is faster than regex methods`
Functions in need of porting 2016-08-23 17:06:29 +00:00
feat: implemented extractBestNode functionality Squashed commit of the following: commit 9af554dd975ff1778ed70c71fa9bde667fc5f880 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 15:19:32 2016 -0400 feat: add cleanHeaders commit 0dfea98eedc4f97fcbd78866322595c705e20521 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 14:30:49 2016 -0400 fix: scoring parent nodes recursively commit b6e5897a694adeb81e25a905aba72c0f45a8cc94 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 12:47:24 2016 -0400 feat: extract clean node up and running commit fb652c5db13db6bce7271efd68ba4b20515e9549 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 09:57:21 2016 -0400 chore: added test for p tags with nested tags (e.g., img, iframe) commit 731d0a2e4d89121dfafad195e9d0911805c4f8e4 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 17:50:33 2016 -0400 feat: extact clean node integrates most functions commit 322bc6534d30feb7c1c08d3813132badc6286b40 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:46:04 2016 -0400 feat: removing empty nodes as defined in constants commit f1d38932ea12a865814d2326970031fcb8515baa Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:33:31 2016 -0400 feat: cleaning attributes from nodes commit 0aa73ada6854af0ecd504bfe3d926a9524787ab5 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:09:56 2016 -0400 feat: cleaning h1s from text commit 12d4a309246285c278ce7765e4fbaa8271bb5889 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:52:03 2016 -0400 feat: removing spacer images commit 4e74ff830cc67586560f6fc72e2cfa432a3a2647 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:38:49 2016 -0400 feat: stripping unwanted html from doc commit c774166e90169fd0c1aa89898d3f7a975e82bf0a Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:17:32 2016 -0400 feat: removing small images, height attribute from images commit 3a8642f42cda451669c832482c5e1611b1ff2ea9 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 12:57:45 2016 -0400 feat: rewrite top level commit a1c03e779234b0aea02206d92ec3dcc15758507e Author: Adam Pash <adam.pash@gmail.com> Date: Fri Aug 26 17:34:36 2016 -0400 in a weird place rn 2016-08-30 19:25:25 +00:00			`DONE:`
fix: bug in scoring and converting to paragraphs 2016-09-14 14:15:36 +00:00			`x Separate constants into activity-specific folders (dom, scoring)`
			`x extractNextPageUrl`
feat: small improvement to author selectors 2016-09-12 14:51:29 +00:00			`x Make sure weightNodes flag is being passed properly`
			`x Rename all cleaners from cleanThing to clean`
refactor: limiting calls to $ function Squashed commit of the following: commit c72da261cb5319d1eef207bff63b3c9cd49018df Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 9 15:28:43 2016 -0400 refactor: limiting calls to $ function commit eeae88247d844d5c6acbc529dbc3ce4d14e04191 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 9 15:14:33 2016 -0400 refactor: convertNodeTo; requires a cheerio object 2016-09-09 19:29:07 +00:00			`x Remove $ from function calls to getScore`
feat: whitelisting attrs to keep 2016-09-09 18:33:16 +00:00			`x remove all but attributes whitelist. research what attributes are important beyond SRC and href`
chore: remove logic for fetching meta tags with custom attrs (resource normalizes this now 2016-09-09 17:56:06 +00:00			`x remove logic for fetching meta attrs with custom props`
fix: cleaning embed and object nodes 2016-09-09 15:58:22 +00:00			`x cleaning embed and object nodes`
			`x run makeLinksAbsolute on extracted content before returning`
feat: can now fetch attrs in RootExtractor's select method 2016-09-09 14:25:12 +00:00			`x add option to fetch attrs in RootExtractor's select method`
refactor: cleaners now run on custom extractors Squashed commit of the following: commit e4c7d1d149d1846f0d589b3653655b81b477c682 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 8 19:29:26 2016 -0400 refactor: cleaners now run on custom extractors commit ca08d2482c54bf6a40f50758da9353f00987a4d7 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 8 14:42:19 2016 -0400 moved cleaners, refactored as necessary commit ec2c5d36410b255c6d8ee264deca990c46709c3c Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 8 14:07:01 2016 -0400 moved datePublished cleaner commit 5e55e397eecb3e88d64cd2aa2c6071c9cffed272 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 8 13:34:21 2016 -0400 moved dek cleaner commit 2dfb0c44d7882336992fdc864792df6eac094c21 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 8 13:29:37 2016 -0400 moved lead-image-url commit cef7a213b80ddd671249225622f1388f9e68896c Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 8 13:26:20 2016 -0400 moved author 2016-09-08 23:29:57 +00:00			`x get custom datePublished selector to convert to date object (prob through cleaner)`
			`x extract and generalize cleaners`
			`x move arguments to cleaners to object`
fix: better scoring for iamge extensions 2016-09-06 14:00:43 +00:00			`x Check that lead-image-url extractor isn't looking for end-of-string file extension matches (i.e., it could be ...foo.jpg?otherstuff`
feat: GenericExtractLeadImageUrl Squashed commit of the following: commit 22d37ebf26dbbd0a3daebbfde3509a6ce04aaf72 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 1 17:50:13 2016 -0400 feat: GenericExtractLeadImageUrl commit 3327a0a7929dd0e9267dc9c26f4e2aa78c32586f Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 1 15:33:42 2016 -0400 feat: can pass custom attributes to extractFromMeta 2016-09-01 21:50:42 +00:00			`x extractLeadImageUrl`
chore: code reorganization Squashed commit of the following: commit 636296841d5cf5e685237fe70db7a15305d8e966 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 9 13:37:21 2016 -0400 final cleanup commit 51f712b3074d41a1f2da91519289d4dd09719ad0 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 9 13:25:28 2016 -0400 Another big pass commit 3860e6d872a9adb9290093fd9c8708dfcc773c28 Author: Adam Pash <adam.pash@gmail.com> Date: Fri Sep 9 12:49:52 2016 -0400 chore: started reorganizing 2016-09-09 17:44:58 +00:00			`x Resource (fetches page, validates it, cleans it, normalizes meta tags (!), converts lazy-loaded images, makes links absolute, etc)`
feat: GenericExtractLeadImageUrl Squashed commit of the following: commit 22d37ebf26dbbd0a3daebbfde3509a6ce04aaf72 Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 1 17:50:13 2016 -0400 feat: GenericExtractLeadImageUrl commit 3327a0a7929dd0e9267dc9c26f4e2aa78c32586f Author: Adam Pash <adam.pash@gmail.com> Date: Thu Sep 1 15:33:42 2016 -0400 feat: can pass custom attributes to extractFromMeta 2016-09-01 21:50:42 +00:00			`x extractDek`
feat: extract dek stubbed (not currently functional) 2016-09-01 18:09:28 +00:00			`x extractDatePublished`
feat: title extraction and scaffolding for more Squashed commit of the following: commit 31d8b63dcb3ec9bbd6c8e7a10852fbd060e91103 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Aug 31 15:52:27 2016 -0400 feat: title extraction commit 7002c552a9f5bb54630455d983b699c041c629fc Author: Adam Pash <adam.pash@gmail.com> Date: Wed Aug 31 14:21:29 2016 -0400 feat: withinComment checks if a node is inside a comment commit 57f06ef5b499c2f747edee0c9eb276e38984de9a Author: Adam Pash <adam.pash@gmail.com> Date: Wed Aug 31 13:40:36 2016 -0400 feat: extractFromMeta function commit 0947f21aae94fa5ce462246ed5cb53144d563931 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Aug 31 13:32:30 2016 -0400 fix: returning original string if no tags in string commit dd6b032e5f9877395b9600480dd96c6fdf60cecd Author: Adam Pash <adam.pash@gmail.com> Date: Wed Aug 31 12:03:58 2016 -0400 feat: clean title function removes junk from titles commit f33b3eef29ad7692441bd0e5aa26b11dd4411dde Author: Adam Pash <adam.pash@gmail.com> Date: Wed Aug 31 12:03:35 2016 -0400 chore: renamed function to correct name commit 076a986b12df68a939a8efa773e01d08780d79aa Author: Adam Pash <adam.pash@gmail.com> Date: Wed Aug 31 12:02:18 2016 -0400 feat: utility method to strip tags from text commit f3e98cdf0a0d7601fab9e8824c0cde73ded51651 Author: Adam Pash <adam.pash@gmail.com> Date: Wed Aug 31 11:31:33 2016 -0400 feat: resolveSplitTitle cleans raw title text 2016-08-31 19:52:48 +00:00			`x Title metadata`
			`x Test re-initializing $ if/when it needs to loop again`
feat: implemented extractBestNode functionality Squashed commit of the following: commit 9af554dd975ff1778ed70c71fa9bde667fc5f880 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 15:19:32 2016 -0400 feat: add cleanHeaders commit 0dfea98eedc4f97fcbd78866322595c705e20521 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 14:30:49 2016 -0400 fix: scoring parent nodes recursively commit b6e5897a694adeb81e25a905aba72c0f45a8cc94 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 12:47:24 2016 -0400 feat: extract clean node up and running commit fb652c5db13db6bce7271efd68ba4b20515e9549 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 09:57:21 2016 -0400 chore: added test for p tags with nested tags (e.g., img, iframe) commit 731d0a2e4d89121dfafad195e9d0911805c4f8e4 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 17:50:33 2016 -0400 feat: extact clean node integrates most functions commit 322bc6534d30feb7c1c08d3813132badc6286b40 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:46:04 2016 -0400 feat: removing empty nodes as defined in constants commit f1d38932ea12a865814d2326970031fcb8515baa Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:33:31 2016 -0400 feat: cleaning attributes from nodes commit 0aa73ada6854af0ecd504bfe3d926a9524787ab5 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:09:56 2016 -0400 feat: cleaning h1s from text commit 12d4a309246285c278ce7765e4fbaa8271bb5889 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:52:03 2016 -0400 feat: removing spacer images commit 4e74ff830cc67586560f6fc72e2cfa432a3a2647 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:38:49 2016 -0400 feat: stripping unwanted html from doc commit c774166e90169fd0c1aa89898d3f7a975e82bf0a Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:17:32 2016 -0400 feat: removing small images, height attribute from images commit 3a8642f42cda451669c832482c5e1611b1ff2ea9 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 12:57:45 2016 -0400 feat: rewrite top level commit a1c03e779234b0aea02206d92ec3dcc15758507e Author: Adam Pash <adam.pash@gmail.com> Date: Fri Aug 26 17:34:36 2016 -0400 in a weird place rn 2016-08-30 19:25:25 +00:00			x `cleanHeaders` Remove any headers that are before any p tags, matching title, etc
			x `extract` (this kicks it all off)
Functions in need of porting 2016-08-23 17:06:29 +00:00			x `node_is_sufficient`
feat: implemented extractBestNode functionality Squashed commit of the following: commit 9af554dd975ff1778ed70c71fa9bde667fc5f880 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 15:19:32 2016 -0400 feat: add cleanHeaders commit 0dfea98eedc4f97fcbd78866322595c705e20521 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 14:30:49 2016 -0400 fix: scoring parent nodes recursively commit b6e5897a694adeb81e25a905aba72c0f45a8cc94 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 12:47:24 2016 -0400 feat: extract clean node up and running commit fb652c5db13db6bce7271efd68ba4b20515e9549 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 09:57:21 2016 -0400 chore: added test for p tags with nested tags (e.g., img, iframe) commit 731d0a2e4d89121dfafad195e9d0911805c4f8e4 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 17:50:33 2016 -0400 feat: extact clean node integrates most functions commit 322bc6534d30feb7c1c08d3813132badc6286b40 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:46:04 2016 -0400 feat: removing empty nodes as defined in constants commit f1d38932ea12a865814d2326970031fcb8515baa Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:33:31 2016 -0400 feat: cleaning attributes from nodes commit 0aa73ada6854af0ecd504bfe3d926a9524787ab5 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:09:56 2016 -0400 feat: cleaning h1s from text commit 12d4a309246285c278ce7765e4fbaa8271bb5889 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:52:03 2016 -0400 feat: removing spacer images commit 4e74ff830cc67586560f6fc72e2cfa432a3a2647 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:38:49 2016 -0400 feat: stripping unwanted html from doc commit c774166e90169fd0c1aa89898d3f7a975e82bf0a Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:17:32 2016 -0400 feat: removing small images, height attribute from images commit 3a8642f42cda451669c832482c5e1611b1ff2ea9 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 12:57:45 2016 -0400 feat: rewrite top level commit a1c03e779234b0aea02206d92ec3dcc15758507e Author: Adam Pash <adam.pash@gmail.com> Date: Fri Aug 26 17:34:36 2016 -0400 in a weird place rn 2016-08-30 19:25:25 +00:00			x `_extract_best_node`
Functions in need of porting 2016-08-23 17:06:29 +00:00			x `get_weight`
updated todo 2016-08-23 19:15:12 +00:00			x `_strip_unlikely_candidates`
feat: convertToParagraphs function working 2016-08-24 14:51:20 +00:00			x `_convert_to_paragraphs`
Converting multiple line breaks to p 2016-08-24 14:00:15 +00:00			x `_brs_to_paragraphs`
			x `_paragraphize`
Functions in need of porting 2016-08-23 17:06:29 +00:00
			`## Scoring`

feat: ported scoring methods with unit tests 2016-08-24 19:30:16 +00:00			x `_get_score`
			x `_set_score`
			x `_add_score`
feat: added scoreContent function 2016-08-25 19:31:09 +00:00			x `_score_content`
feat: ported scoring methods with unit tests 2016-08-24 19:30:16 +00:00			x `_score_node`
			x `_score_paragraph`
Functions in need of porting 2016-08-23 17:06:29 +00:00
			`## Top Candidate`

feat: find top candidate function 2016-08-25 23:15:04 +00:00			x `_find_top_candidate`
feat: implemented extractBestNode functionality Squashed commit of the following: commit 9af554dd975ff1778ed70c71fa9bde667fc5f880 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 15:19:32 2016 -0400 feat: add cleanHeaders commit 0dfea98eedc4f97fcbd78866322595c705e20521 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 14:30:49 2016 -0400 fix: scoring parent nodes recursively commit b6e5897a694adeb81e25a905aba72c0f45a8cc94 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 12:47:24 2016 -0400 feat: extract clean node up and running commit fb652c5db13db6bce7271efd68ba4b20515e9549 Author: Adam Pash <adam.pash@gmail.com> Date: Tue Aug 30 09:57:21 2016 -0400 chore: added test for p tags with nested tags (e.g., img, iframe) commit 731d0a2e4d89121dfafad195e9d0911805c4f8e4 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 17:50:33 2016 -0400 feat: extact clean node integrates most functions commit 322bc6534d30feb7c1c08d3813132badc6286b40 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:46:04 2016 -0400 feat: removing empty nodes as defined in constants commit f1d38932ea12a865814d2326970031fcb8515baa Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:33:31 2016 -0400 feat: cleaning attributes from nodes commit 0aa73ada6854af0ecd504bfe3d926a9524787ab5 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 16:09:56 2016 -0400 feat: cleaning h1s from text commit 12d4a309246285c278ce7765e4fbaa8271bb5889 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:52:03 2016 -0400 feat: removing spacer images commit 4e74ff830cc67586560f6fc72e2cfa432a3a2647 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:38:49 2016 -0400 feat: stripping unwanted html from doc commit c774166e90169fd0c1aa89898d3f7a975e82bf0a Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 15:17:32 2016 -0400 feat: removing small images, height attribute from images commit 3a8642f42cda451669c832482c5e1611b1ff2ea9 Author: Adam Pash <adam.pash@gmail.com> Date: Mon Aug 29 12:57:45 2016 -0400 feat: rewrite top level commit a1c03e779234b0aea02206d92ec3dcc15758507e Author: Adam Pash <adam.pash@gmail.com> Date: Fri Aug 26 17:34:36 2016 -0400 in a weird place rn 2016-08-30 19:25:25 +00:00			x `extract_clean_node`
			x `_clean_conditionally`
feat: added scoreContent function 2016-08-25 19:31:09 +00:00