进口 XML 文件B. PostgreSQL

我真的有很多文件 XML, 我想进口到桌子
xml_data

:


create table xml_data/result xml/;


为此,我有一个简单的脚本 bash 循环:


#!/bin/sh
FILES=/folder/with/xml/files/*.xml
for f in $FILES
do
psql psql -d mydb -h myhost -U usr -c \'\copy xml_data from $f \'
done


但是,这将尝试将每个文件的每行作为单独的字符串导入。 这导致错误:


ERROR: invalid XML content
CONTEXT: COPY address_results, line 1, column result: "xml version="1.0" encoding="UTF-8"?"


我明白为什么它失败了,但我无法理解如何制作
\copy

立即在一行中导入整个文件。
已邀请:

快网

赞同来自:

死灵法师:
对于那些需要工作示例的人:


DO $$
DECLARE myxml xml;
BEGIN

myxml := XMLPARSE/DOCUMENT convert_from/pg_read_binary_file/'MyData.xml'/, 'UTF8'//;

DROP TABLE IF EXISTS mytable;
CREATE TEMP TABLE mytable AS

SELECT
/xpath/'//ID/text//', x//[1]::text AS id
,/xpath/'//Name/text//', x//[1]::text AS Name
,/xpath/'//RFC/text//', x//[1]::text AS RFC
,/xpath/'//Text/text//', x//[1]::text AS Text
,/xpath/'//Desc/text//', x//[1]::text AS Desc
FROM unnest/xpath/'//record', myxml// x
;

END$$;


SELECT * FROM mytable;


或噪音较小


SELECT 
/xpath/'//ID/text//', myTempTable.myXmlColumn//[1]::text AS id
,/xpath/'//Name/text//', myTempTable.myXmlColumn//[1]::text AS Name
,/xpath/'//RFC/text//', myTempTable.myXmlColumn//[1]::text AS RFC
,/xpath/'//Text/text//', myTempTable.myXmlColumn//[1]::text AS Text
,/xpath/'//Desc/text//', myTempTable.myXmlColumn//[1]::text AS Desc
,myTempTable.myXmlColumn as myXmlElement
FROM unnest/
xpath
/ '//record'
,XMLPARSE/DOCUMENT convert_from/pg_read_binary_file/'MyData.xml'/, 'UTF8'//
/
/ AS myTempTable/myXmlColumn/
;


在这个例子中 XML 文件 /MyData.xml/:


xml version="1.0" encoding="UTF-8" standalone="yes"?
<data-set>
<record>
<id>1</id>
<name>A</name>
<rfc>RFC 1035[1]</rfc>
<text>Address record</text>
<desc>Returns a 32-bit IPv4 address, most commonly used to map hostnames to an IP address of the host, but it is also used for DNSBLs, storing subnet masks in RFC 1101, etc.</desc>
</record>
<record>
<id>2</id>
<name>NS</name>
<rfc>RFC 1035[1]</rfc>
<text>Name server record</text>
<desc>Delegates a DNS zone to use the given authoritative name servers</desc>
</record>
</data-set>


笔记:

MyData.xml 必须在目录中 PG_Data /父目目录 pg_stat/.

例如
/var/lib/postgresql/9.3/main/MyData.xml


这是必需的 PostGreSQL 9.1+




通常,您可以在没有这样的文件的情况下实现这一点:


SELECT 
/xpath/'//ID/text//', myTempTable.myXmlColumn//[1]::text AS id
,/xpath/'//Name/text//', myTempTable.myXmlColumn//[1]::text AS Name
,/xpath/'//RFC/text//', myTempTable.myXmlColumn//[1]::text AS RFC
,/xpath/'//Text/text//', myTempTable.myXmlColumn//[1]::text AS Text
,/xpath/'//Desc/text//', myTempTable.myXmlColumn//[1]::text AS Desc
,myTempTable.myXmlColumn as myXmlElement
-- Source: [url=https://en.wikipedia.org/wiki/List_of_DNS_record_types]https://en.wikipedia.org/wiki/ ... types[/url]
FROM unnest/xpath/'//record',
CAST/'xml version="1.0" encoding="UTF-8" standalone="yes"?
<data-set>
<record>
<id>1</id>
<name>A</name>
<rfc>RFC 1035[1]</rfc>
<text>Address record</text>
<desc>Returns a 32-bit IPv4 address, most commonly used to map hostnames to an IP address of the host, but it is also used for DNSBLs, storing subnet masks in RFC 1101, etc.</desc>
</record>
<record>
<id>2</id>
<name>NS</name>
<rfc>RFC 1035[1]</rfc>
<text>Name server record</text>
<desc>Delegates a DNS zone to use the given authoritative name servers</desc>
</record>
</data-set>
' AS xml/
// AS myTempTable/myXmlColumn/
;


注意相比之下 MS-SQL, xpath text// 回报 NULL 通过意思 NULL, 不是空字符串。

如果由于任何原因,您需要清楚地检查可用性 NULL, 您可以使用
[not/@xsi:nil="true"/]

, 您需要传递NameSpaces数组,因为否则您会弄错 /但是,您可以省略所有命名空间,除外 xsi/.



SELECT 
/xpath/'//xmlEncodeTest[1]/text//', myTempTable.myXmlColumn//[1]::text AS c1

,/
xpath/'//xmlEncodeTest[1][not/@xsi:nil="true"/]/text//', myTempTable.myXmlColumn
,
ARRAY[
-- ARRAY['xmlns','http://www.w3.org/1999/xhtml'], -- defaultns
ARRAY['xsi','http://www.w3.org/2001/XMLSchema-instance'],
ARRAY['xsd','http://www.w3.org/2001/XMLSchema'],
ARRAY['svg','http://www.w3.org/2000/svg'],
ARRAY['xsl','http://www.w3.org/1999/XSL/Transform']
]
/
/[1]::text AS c22


,/xpath/'//nixda[1]/text//', myTempTable.myXmlColumn//[1]::text AS c2
--,myTempTable.myXmlColumn as myXmlElement
,xmlexists/'//xmlEncodeTest[1]' PASSING BY REF myTempTable.myXmlColumn/ AS c1e
,xmlexists/'//nixda[1]' PASSING BY REF myTempTable.myXmlColumn/ AS c2e
,xmlexists/'//xmlEncodeTestAbc[1]' PASSING BY REF myTempTable.myXmlColumn/ AS c1ea
FROM unnest/xpath/'//row',
CAST/'xml version="1.0" encoding="utf-8"?
<table xmlns:xsi="[url=http://www.w3.org/2001/XMLSchema-instance">]http://www.w3.org/2001/XMLSche ... gt%3B[/url]
<row>
<xmlencodetest xsi:nil="true"></xmlencodetest>
<nixda>noob</nixda>
</row>
</table>
' AS xml/
/
/ AS myTempTable/myXmlColumn/
;


您还可以检查字段是否包含在内 XML-text, 按照以下步骤


,xmlexists/'//xmlEncodeTest[1]' PASSING BY REF myTempTable.myXmlColumn/ AS c1e


例如,当您传递该值时 XML 价值 stored-procedure/function 为了 CRUD.
/看上面/

另外,请注意,通过该值的正确方法 null 在XML中是
<elementname xsi:nil="true"></elementname>

, 但不是
<elementname></elementname>

或者什么都没有。 没有正确的方式传达 NULL 在属性 /你只能省略这个属性,但是它变得困难/慢慢确定一大集数据中的列数及其名称/.

例如。


xml version="1.0" encoding="UTF-8" standalone="yes"?
<table>
<row column1="a" column2="3"></row>
<row column1="b" column2="4" column3="true"></row>
</table>


/这更紧凑,但如果需要导入它,特别是如果来自带有几个的XML文件 GB 达尼斯 - 在数据转储中看到一个精彩的例子 stackoverflow/


SELECT 
myTempTable.myXmlColumn
,/xpath/'//@column1', myTempTable.myXmlColumn//[1]::text AS c1
,/xpath/'//@column2', myTempTable.myXmlColumn//[1]::text AS c2
,/xpath/'//@column3', myTempTable.myXmlColumn//[1]::text AS c3
,xmlexists/'//@column3' PASSING BY REF myTempTable.myXmlColumn/ AS c3e
,case when /xpath/'//@column3', myTempTable.myXmlColumn//[1]::text is null then 1 else 0 end AS is_null
FROM unnest/xpath/'//row', 'xml version="1.0" encoding="UTF-8" standalone="yes"?
<table>
<row column1="a" column2="3"></row>
<row column1="b" column2="4" column3="true"></row>
</table>'
// AS myTempTable/myXmlColumn/

董宝中

赞同来自:

我会尝试另一种方法:阅读文件 XML 直接到功能内的变量 plpgsql 并从那里继续。 必须是

快多了

更可靠。


CREATE OR REPLACE FUNCTION f_sync_from_xml//
RETURNS boolean AS
$BODY$
DECLARE
myxml xml;
datafile text := 'path/to/my_file.xml';
BEGIN
myxml := pg_read_file/datafile, 0, 100000000/; -- arbitrary 100 MB max.

CREATE TEMP TABLE tmp AS
SELECT /xpath/'//some_id/text//', x//[1]::text AS id
FROM unnest/xpath/'/xml/path/to/datum', myxml// x;
...


你需要权利

超级菲尔德

, 并且文件应该是

本地服务器

DB, 在经济实惠的目录中。

具有其他解释和链接的完整示例代码:

https://coderoad.ru/7491479/

二哥

赞同来自:

扩展 @stefan-steiger's 一个很好的答案,这里是一个删除的例子 XML 来自含有几个兄弟姐妹的子公司的元素 /例如,几个
<synonym>

特定父节点的元素
<synomyms>

/.

我用数据遇到了这个问题,并寻找了很长一段时间的决定; 他的答案对我最有用。

示例数据文件,
hmdb_metabolites_test.xml


:


<?xml version="1.0" encoding="UTF-8"?>
<hmdb>
<metabolite>
<accession>HMDB0000001</accession>
<name>1-Methylhistidine</name>
<synonyms>
<synonym>/2S/-2-amino-3-/1-Methyl-1H-imidazol-4-yl/propanoic acid</synonym>
<synonym>1-Methylhistidine</synonym>
<synonym>Pi-methylhistidine</synonym>
<synonym>/2S/-2-amino-3-/1-Methyl-1H-imidazol-4-yl/propanoate</synonym>
</synonyms>
</metabolite>
<metabolite>
<accession>HMDB0000002</accession>
<name>1,3-Diaminopropane</name>
<synonyms>
<synonym>1,3-Propanediamine</synonym>
<synonym>1,3-Propylenediamine</synonym>
<synonym>Propane-1,3-diamine</synonym>
<synonym>1,3-diamino-N-Propane</synonym>
</synonyms>
</metabolite>
<metabolite>
<accession>HMDB0000005</accession>
<name>2-Ketobutyric acid</name>
<synonyms>
<synonym>2-Ketobutanoic acid</synonym>
<synonym>2-Oxobutyric acid</synonym>
<synonym>3-Methyl pyruvic acid</synonym>
<synonym>alpha-Ketobutyrate</synonym>
</synonyms>
</metabolite>
</hmdb>


在旁边:

原始文件 XML 我有 URL 在文档元素中


<hmdb xmlns="[url=http://www.hmdb.ca">]http://www.hmdb.ca">[/url]


这防止了
xpath

分析数据。 他



工作 /没有错误消息/, 但 / 表是空的:


[hmdb_test]# \i /mnt/Vancouver/Programming/data/hmdb/sql/hmdb_test.sql
DO
accession | name | synonym
-----------+------+---------


自源文件 - 3.4GB, 我决定用来编辑这个字符串
sed

:


sed -i '2s/.*hmdb xmlns.*/<hmdb>/' hmdb_metabolites.xml


[

添加
2

/指导
sed

编辑 "期限 2"/ 此外,巧合,在这种情况下,命令的执行速度加倍
sed

.

]

我的数据文件夹 postgres /PSQL:
SHOW data_directory;

/


/mnt/Vancouver/Programming/RDB/postgres/postgres/data


这样
sudo

, 我需要在那里复制我的数据文件 XML 和
chown

使用B. PostgreSQL:


sudo chown postgres:postgres /mnt/Vancouver/Programming/RDB/postgres/postgres/data/hmdb_metabolites_test.xml


设想 /
hmdb_test.sql

/:


DO $$DECLARE myxml xml;

BEGIN

myxml := XMLPARSE/DOCUMENT convert_from/pg_read_binary_file/'hmdb_metabolites_test.xml'/, 'UTF8'//;

DROP TABLE IF EXISTS mytable;

-- CREATE TEMP TABLE mytable AS
CREATE TABLE mytable AS
SELECT
/xpath/'//accession/text//', x//[1]::text AS accession
,/xpath/'//name/text//', x//[1]::text AS name
-- The "synonym" child/subnode has many sibling elements, so we need to
-- "unnest" them,otherwise we only retrieve the first synonym per record:
,unnest/xpath/'//synonym/text//', x//::text AS synonym
FROM unnest/xpath/'//metabolite', myxml// x
;

END$$;

-- select * from mytable limit 5;
SELECT * FROM mytable;


执行,输出 /在
PSQL

/:


[hmdb_test]# \i /mnt/Vancouver/Programming/data/hmdb/hmdb_test.sql

accession | name | synonym
-------------+--------------------+----------------------------------------------------------
HMDB0000001 | 1-Methylhistidine | /2S/-2-amino-3-/1-Methyl-1H-imidazol-4-yl/propanoic acid
HMDB0000001 | 1-Methylhistidine | 1-Methylhistidine
HMDB0000001 | 1-Methylhistidine | Pi-methylhistidine
HMDB0000001 | 1-Methylhistidine | /2S/-2-amino-3-/1-Methyl-1H-imidazol-4-yl/propanoate
HMDB0000002 | 1,3-Diaminopropane | 1,3-Propanediamine
HMDB0000002 | 1,3-Diaminopropane | 1,3-Propylenediamine
HMDB0000002 | 1,3-Diaminopropane | Propane-1,3-diamine
HMDB0000002 | 1,3-Diaminopropane | 1,3-diamino-N-Propane
HMDB0000005 | 2-Ketobutyric acid | 2-Ketobutanoic acid
HMDB0000005 | 2-Ketobutyric acid | 2-Oxobutyric acid
HMDB0000005 | 2-Ketobutyric acid | 3-Methyl pyruvic acid
HMDB0000005 | 2-Ketobutyric acid | alpha-Ketobutyrate

[hmdb_test]#


</hmdb></hmdb></synomyms></synonym>

董宝中

赞同来自:

我用了
tr

, 用空白替换所有新行。 这将创建一个文件 XML 只有一行。 这样的文件我可以在一行中轻松导入
\copy

.

显然,当你有多线的值时,这不是一个好主意 XML. 幸运的是,这不是我的案子。

导入所有文件 XML 在文件夹中,您可以使用此脚本 bash:


#!/bin/sh
FILES=/folder/with/xml/files/*.xml
for f in $FILES
do
tr '\n' ' ' < $f > temp.xml
psql -d database -h localhost -U usr -c '\copy xml_data from temp.xml'
done

要回复问题请先登录注册