fix: cleaning embed and object nodes

This commit is contained in:
Adam Pash 2016-09-09 11:58:22 -04:00
parent edfb54c532
commit 52e89a0229
2 changed files with 4 additions and 1 deletions

View File

@ -1,5 +1,4 @@
TODO: TODO:
- run makeLinksAbsolute on extracted content before returning
- remove logic for fetching meta attrs with custom props - remove logic for fetching meta attrs with custom props
- Resource (fetches page, validates it, cleans it, normalizes meta tags (!), converts lazy-loaded images, makes links absolute, etc) - Resource (fetches page, validates it, cleans it, normalizes meta tags (!), converts lazy-loaded images, makes links absolute, etc)
- extractNextPageUrl - extractNextPageUrl
@ -12,6 +11,8 @@ TODO:
- Separate constants into activity-specific folders (dom, scoring) - Separate constants into activity-specific folders (dom, scoring)
DONE: DONE:
x cleaning embed and object nodes
x run makeLinksAbsolute on extracted content before returning
x add option to fetch attrs in RootExtractor's select method x add option to fetch attrs in RootExtractor's select method
x get custom datePublished selector to convert to date object (prob through cleaner) x get custom datePublished selector to convert to date object (prob through cleaner)
x extract and generalize cleaners x extract and generalize cleaners

View File

@ -9,6 +9,8 @@ export const STRIP_OUTPUT_TAGS = [
'link', 'link',
'style', 'style',
'hr', 'hr',
'embed',
'object',
] ]
// cleanAttributes // cleanAttributes