Adam Pash
746d07d4a2
feat: title extraction and scaffolding for more
...
Squashed commit of the following:
commit 31d8b63dcb3ec9bbd6c8e7a10852fbd060e91103
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Aug 31 15:52:27 2016 -0400
feat: title extraction
commit 7002c552a9f5bb54630455d983b699c041c629fc
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Aug 31 14:21:29 2016 -0400
feat: withinComment checks if a node is inside a comment
commit 57f06ef5b499c2f747edee0c9eb276e38984de9a
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Aug 31 13:40:36 2016 -0400
feat: extractFromMeta function
commit 0947f21aae94fa5ce462246ed5cb53144d563931
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Aug 31 13:32:30 2016 -0400
fix: returning original string if no tags in string
commit dd6b032e5f9877395b9600480dd96c6fdf60cecd
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Aug 31 12:03:58 2016 -0400
feat: clean title function removes junk from titles
commit f33b3eef29ad7692441bd0e5aa26b11dd4411dde
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Aug 31 12:03:35 2016 -0400
chore: renamed function to correct name
commit 076a986b12df68a939a8efa773e01d08780d79aa
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Aug 31 12:02:18 2016 -0400
feat: utility method to strip tags from text
commit f3e98cdf0a0d7601fab9e8824c0cde73ded51651
Author: Adam Pash <adam.pash@gmail.com>
Date: Wed Aug 31 11:31:33 2016 -0400
feat: resolveSplitTitle cleans raw title text
2016-08-31 15:52:48 -04:00
Adam Pash
07834c0e15
refactor: restructuring for metadata extraction
2016-08-31 09:42:04 -04:00
Adam Pash
ebea6254b5
ignore npm-debug.log
2016-08-31 09:30:43 -04:00
Adam Pash
95085d1a11
chore: cleanup
2016-08-30 17:08:55 -04:00
Adam Pash
e1ef25aab1
fix: added babel-polyfill for bug in Reflect
2016-08-30 16:07:09 -04:00
Adam Pash
93e844cdfe
feat: implemented extractBestNode functionality
...
Squashed commit of the following:
commit 9af554dd975ff1778ed70c71fa9bde667fc5f880
Author: Adam Pash <adam.pash@gmail.com>
Date: Tue Aug 30 15:19:32 2016 -0400
feat: add cleanHeaders
commit 0dfea98eedc4f97fcbd78866322595c705e20521
Author: Adam Pash <adam.pash@gmail.com>
Date: Tue Aug 30 14:30:49 2016 -0400
fix: scoring parent nodes recursively
commit b6e5897a694adeb81e25a905aba72c0f45a8cc94
Author: Adam Pash <adam.pash@gmail.com>
Date: Tue Aug 30 12:47:24 2016 -0400
feat: extract clean node up and running
commit fb652c5db13db6bce7271efd68ba4b20515e9549
Author: Adam Pash <adam.pash@gmail.com>
Date: Tue Aug 30 09:57:21 2016 -0400
chore: added test for p tags with nested tags (e.g., img, iframe)
commit 731d0a2e4d89121dfafad195e9d0911805c4f8e4
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Aug 29 17:50:33 2016 -0400
feat: extact clean node integrates most functions
commit 322bc6534d30feb7c1c08d3813132badc6286b40
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Aug 29 16:46:04 2016 -0400
feat: removing empty nodes as defined in constants
commit f1d38932ea12a865814d2326970031fcb8515baa
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Aug 29 16:33:31 2016 -0400
feat: cleaning attributes from nodes
commit 0aa73ada6854af0ecd504bfe3d926a9524787ab5
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Aug 29 16:09:56 2016 -0400
feat: cleaning h1s from text
commit 12d4a309246285c278ce7765e4fbaa8271bb5889
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Aug 29 15:52:03 2016 -0400
feat: removing spacer images
commit 4e74ff830cc67586560f6fc72e2cfa432a3a2647
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Aug 29 15:38:49 2016 -0400
feat: stripping unwanted html from doc
commit c774166e90169fd0c1aa89898d3f7a975e82bf0a
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Aug 29 15:17:32 2016 -0400
feat: removing small images, height attribute from images
commit 3a8642f42cda451669c832482c5e1611b1ff2ea9
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Aug 29 12:57:45 2016 -0400
feat: rewrite top level
commit a1c03e779234b0aea02206d92ec3dcc15758507e
Author: Adam Pash <adam.pash@gmail.com>
Date: Fri Aug 26 17:34:36 2016 -0400
in a weird place rn
2016-08-30 15:25:25 -04:00
Adam Pash
9da7a6f2a9
feat: find top candidate function
2016-08-26 09:21:47 -04:00
Adam Pash
e2600231ac
feat: added linkDensity function
2016-08-25 17:43:29 -04:00
Adam Pash
c470261d41
fix: changed parseInt to parseFloat
2016-08-25 15:50:59 -04:00
Adam Pash
44eae5e931
feat: added scoreContent function
2016-08-25 15:31:09 -04:00
Adam Pash
bd7ed77f23
Lots of progress on score-content
2016-08-24 18:23:51 -04:00
Adam Pash
cc734c7e7d
chore: cleaned up repetative testing for dom
2016-08-24 15:50:51 -04:00
Adam Pash
f3b1fefba6
chore: refactored tests
2016-08-24 15:35:27 -04:00
Adam Pash
d4a19e6a27
feat: ported scoring methods with unit tests
2016-08-24 15:30:16 -04:00
Adam Pash
97087bd626
chore: refactored to slightly cleaner file structure (more to do here)
2016-08-24 11:20:13 -04:00
Adam Pash
67e212ffac
feat: convertToParagraphs function working
2016-08-24 10:52:29 -04:00
Adam Pash
c237245e89
Converting multiple line breaks to p
2016-08-24 10:02:46 -04:00
Adam Pash
95d02dadd1
simple logic in place for brsToPs
2016-08-23 16:04:00 -04:00
Adam Pash
d70b9f6709
updated todo
2016-08-23 15:15:12 -04:00
Adam Pash
777e11c25c
Stripping unlikely candidates from DOM
2016-08-23 15:03:03 -04:00
Adam Pash
89a2cfbb82
getWeight with tests
2016-08-23 13:06:43 -04:00
Adam Pash
db3b1ec271
Functions in need of porting
2016-08-23 13:06:29 -04:00
Adam Pash
f3aebb2a16
Basic testing in place
2016-08-23 11:03:31 -04:00
Adam Pash
8efcc70eef
bringing in cheerio
2016-08-23 10:30:40 -04:00
Adam Pash
7f95b9f44f
basic structure
2016-08-22 14:54:51 -04:00
Adam Pash
155efb3833
add gitignore
2016-08-22 14:54:34 -04:00
Adam Pash
b349a1eac5
using rollup
2016-08-22 14:53:42 -04:00
Adam Pash
c2a8edee97
Quick port of constants file
2016-08-22 14:52:01 -04:00