Import XML file mapping the tag

Louis C. shared this question 5 months ago
Answered

Hi all,

I am trying to import an XML file in Citavi. I modified the mapping in order to fill the various field needed in Citavi (see enclosed). I still have a slight issue for the Keywords.

In the XML file, they are on several lines as written below. However, first word of each line is agregate with the last word of the previous line (e.g for the two first lines ACETONITRILE WASTE SOLUTIONEXTENSIVE UTILIZATION).

I tried unsuccessfully to use regular expression in order to replace the tag <\KEYW> by ";". Is there a way to insert this ";"?


Thankyour for your help,

Louis

<KEYW-LIST>

<KEYW occ="34" score="100">ACETONITRILE WASTE SOLUTION</KEYW>

<KEYW occ="11" score="100">EXTENSIVE UTILIZATION</KEYW>

<KEYW occ="35" score="100">WASTE SOLUTION</KEYW>

<KEYW occ="28" score="64">ACETONITRILE STEAM</KEYW>

<KEYW occ="17" score="45">FRACTION COLLECTION</KEYW>

<KEYW occ="2" score="41">RECOVERED ACETONITRILE WASTE SOLUTION</KEYW>

<KEYW occ="37" score="34">ACETONITRILE</KEYW>

<KEYW occ="19" score="31">ACETONITRILE LIQUID</KEYW>

<KEYW occ="20" score="31">DISTILLATION</KEYW>

<KEYW occ="1" score="31">TERTIARY DISTILLATION</KEYW>

<KEYW occ="7" score="29">PRIMARY DISTILLATION</KEYW>

<KEYW occ="8" score="28">ANHYDROUS ACETONITRILE PRODUCT</KEYW>

<KEYW occ="1" score="26">CONDENSED PURE ACETONITRILE LIQUID</KEYW>

<KEYW occ="1" score="25">COLLECTING ACETONITRILE LIQUID</KEYW>

<KEYW occ="4" score="25">CONDENSED FRONT FRACTION</KEYW>

<KEYW occ="1" score="22">ACETONITRILE PRODUCTION</KEYW>

<KEYW occ="21" score="20">CONDENSATION</KEYW>

<KEYW occ="14" score="20">PHOSPHORUS PENTOXIDE</KEYW>

<KEYW occ="7" score="20">SOLID PHOSPHORUS PENTOXIDE</KEYW>

<KEYW occ="8" score="19">ACETONITRILE PRODUCT</KEYW>

</KEYW-LIST>

Replies (2)

photo
1

Hi,

what should work is a substitution like this:

<KEYW .*?>(.+?)</KEYW> matches each of the lines with a group for the actual keyword in parentheses. You could now replace by \1; (backreference to the group) to get all of the keywords separated out by semicola.

Given the syntax in CitaviTX files something like this should do the trick:

<Substitution pattern="<KEYW.*?>(.+?)</KEYW>", replaceBy = "$1;" />
(you need to replace the angle brackets by their HTML escapes -- ampersand lt semicolon and ampersand rt semicolon -- but the forum software doesn't allow this)


Best regards

Sebastian

photo
1

Hi Sebastian,


Thank you for your answer. I tried your syntax online using a regular expression simulator. It seems to work.

However, I modified the CitaviTX files without forgetting to replace "<" and ">" as you mentionned, it still doesn't work.

I also tried this synthax <Substitution pattern="<KEYW.*?>(.+?)<\/KEYW>", replaceBy = "$1;" /> by adding a "\" before the "/".

Something must be missing but i can't figure exactly what. Maybe I should capture the entire line...

photo
1

Hi, Louis,

that's the syntax I had expected to work:

d1918114eb08c787fa2601888bbc014d

However, when trying to import the sample file you sent, there are several errors. I'm not even sure if the document is valid XML as it contains content outside tags.

Best regards

Sebastian

photo
1

Hi Sebastian,


You are right it seems the XML file is not correct. Please find enclosed a new one.

However even with this file, problem is still remaing.

photo
1

Hi, Louis,

it works, but not as expected. The keywords are turned into one "giant" keyword:

151969e9799f3274e5d682da966e091b

I'll need to discuss this with one of our developers.

Best regards

Sebastian

photo
1

Hi Sebastian,

Yes Indeed, I had the same feeling. I did some quick test trying to capture only ">". It doesn't works. I think we capture the appropriate XML part but without the XML tag.

I was able to add ";" only a the end of the "giant" keyword. I tried <KeywordSplitter>;</KeywordSplitter>, it cuts every word after the first letter.


Kind regards,

Louis

photo
Leave a Comment
 
Attach file (NO confidential documents!)