[blog] add a post about xpath() TVF

pull/884/head
Timothy Stack 3 years ago
parent 04c4f8a779
commit c13e78358e

@ -0,0 +1,39 @@
---
layout: post
title: Drilling down into XML snippets
excerpt: The new "xpath()" table-valued SQL function.
---
*(This change is in v0.10.0+)*
XML snippets in log messages can now be queried using the
[`xpath()`](https://docs.lnav.org/en/latest/sqlext.html#xpath-xpath-xmldoc)
table-valued SQL function. The function takes an
[XPath](https://developer.mozilla.org/en-US/docs/Web/XPath), the XML snippet
to be queried, and returns a table with the results of the XPath query.
For example, given following XML document:
```xml
<msg>Hello, World!</msg>
```
Extracting the text value from the `msg` node can be done using the following
query:
```sql
SELECT result FROM xpath('/msg/text()', '<msg>Hello, World!</msg>')
```
Of course, you won't typically be passing XML values as string literals, you
will be extracting them from log messages. Assuming your log format already
extracts the XML data, you can do a `SELECT` on the log format table and join
that with the `xpath()` call. Since it can be challenging to construct a
correct `xpath()` call, lnav will suggest calls for the nodes it finds in any
XML log message fields. The following asciicast demonstrates this flow:
<script id="asciicast-x89mrk8JPHBmB4pTbaZvTt8Do"
src="https://asciinema.org/a/x89mrk8JPHBmB4pTbaZvTt8Do.js"
async>
</script>
The implementation uses the [pugixml](https://pugixml.org) library.

@ -147,8 +147,6 @@ dist_noinst_DATA = \
ansi-palette.json \
$(BUILTIN_LNAVSCRIPTS) \
$(BUILTIN_SHSCRIPTS) \
internals/config-v1.schema.json \
internals/format-v1.schema.json \
$(CONFIG_FILES) \
$(FORMAT_FILES) \
xterm-palette.json
@ -430,6 +428,7 @@ uncrusty:
if !DISABLE_DOCUMENTATION
all-local: lnav
env DUMP_INTERNALS_DIR=$(srcdir)/internals DUMP_CRASH=1 ./lnav Makefile
mv $(srcdir)/internals/*.schema.json $(top_srcdir)/docs/schemas
endif
install-exec-hook:

@ -29,6 +29,8 @@
#include "config.h"
#include <regex>
#include "base/date_time_scanner.hh"
#include "base/time_util.hh"
@ -43,6 +45,13 @@ void db_label_source::text_value_for_line(textview_curses &tc, int row,
std::string &label_out,
text_sub_source::line_flags_t flags)
{
static const std::regex RE_TAB("\t");
static const std::string TAB_SYMBOL = "\u21e5";
static const std::regex RE_LF("\n");
static const std::string LF_SYMBOL = "\u240a";
static const std::regex RE_CR("\n");
static const std::string CR_SYMBOL = "\u240d";
/*
* start_value is the result rowid, each bucket type is a column value
* label_out should be the raw text output.
@ -57,6 +66,9 @@ void db_label_source::text_value_for_line(textview_curses &tc, int row,
this->dls_headers[lpc].hm_column_size);
auto cell_str = std::string(this->dls_rows[row][lpc]);
cell_str = std::regex_replace(cell_str, RE_TAB, TAB_SYMBOL);
cell_str = std::regex_replace(cell_str, RE_LF, LF_SYMBOL);
cell_str = std::regex_replace(cell_str, RE_CR, CR_SYMBOL);
truncate_to(cell_str, MAX_COLUMN_WIDTH);
auto cell_length = utf8_string_length(cell_str)

@ -3499,8 +3499,8 @@ xpath(*xpath*, *xmldoc*)
;SELECT * FROM xpath('/abc/def', '<abc><def a="b">Hello</def><def>Bye</def></abc>')
result node_path node_attr node_text
<def a="b">Hello</def> /abc/def[1] {"a":"b"} Hello
<def>Bye</def> /abc/def[2] {} Bye
<def a="b">Hello</def> /abc/def[1] {"a":"b"} Hello
<def>Bye</def> /abc/def[2] {} Bye
To select all 'a' attributes on the path '/abc/def':

@ -308,6 +308,12 @@ static int rcFilter(sqlite3_vtab_cursor *pVtabCursor,
pCur->c_value_as_blob = (sqlite3_value_type(argv[1]) == SQLITE_BLOB);
auto byte_count = sqlite3_value_bytes(argv[1]);
if (byte_count == 0) {
pCur->c_rowid = 0;
return SQLITE_OK;
}
auto blob = (const char *) sqlite3_value_blob(argv[1]);
pCur->c_value.assign(blob, byte_count);
auto parse_res = pCur->c_doc.load_string(pCur->c_value.c_str());

Loading…
Cancel
Save