I'm looking for an efficient way to reorganize parts of an XML document that contain multiple children of a type such as 'SmallCat' or 'BigCat'.
Here are the rules:
- Everything except for Habitat nodes should be passed through; attributes and all.
- Habitat nodes with less than 2 instances of either BigCat or SmallCat should be passed through.
The input document looks like:
<Zoo>
<Habitat HabitatID="habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<BigCat AnimalID="Tiger.1">
<Type>Bengal</Type>
</BigCat>
<SmallCat AnimalID="bobcat.1">
<Type>Bobcat</Type>
</SmallCat>
<BodyTemp>endothermic</BodyTemp>
</Habitat>
<Habitat HabitatID="cage.2">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="tabycat.1">
<Type>Tabycat</Type>
</SmallCat>
<BodyTemp>endothermic</BodyTemp>
</Habitat>
<ConsessionStand>
<Type>PopcornStand</Type>
</ConsessionStand>
</Zoo>
The output should look like:
<Zoo>
<Habitat HabitatID="sub_habitat.1.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<BigCat AnimalID="Tiger.1">
<Type>Bengal</Type>
</BigCat>
</Habitat>
<Habitat HabitatID="sub_habitat.2.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="bobcat.1">
<Type>Bobcat</Type>
</SmallCat>
</Habitat>
<Habitat HabitatID="habitat.cage.1">
<BodyTemp>endothermic</BodyTemp>
<Child>
<HabitatID>sub_habitat.1.habitat.cage.1</HabitatID>
</Child>
<Child>
<HabitatID>sub_habitat.2.habitat.cage.1</HabitatID>
</Child>
</Habitat>
<Habitat HabitatID="cage.2">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="tabycat.1">
<Type>Tabycat</Type>
</SmallCat>
<BodyTemp>endothermic</BodyTemp>
</Habitat>
<ConsessionStand>
<Type>PopcornStand</Type>
</ConsessionStand>
</Zoo>
The ideal solution will use XSLT but, any solution (bash, javascript, php, python, ruby, go, etc) that gets the job done is a worthy contender.
Here's an implementation that does ~90% of the work.
This solution does not reconstruct the first Habitat node with references to the new sub_habitat child nodes.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Habitat[count(BigCat|SmallCat) > 1]">
<xsl:param name="i"/>
<xsl:for-each select="BigCat|SmallCat">
<xsl:choose>
<xsl:when test="self::BigCat">
<Habitat HabitatID="sub_habitat.{position()}.{../@HabitatID}">
<xsl:copy-of select="../*[not(self::SmallCat|self::BodyTemp)]"/>
</Habitat>
</xsl:when>
<xsl:when test="self::SmallCat">
<Habitat HabitatID="sub_habitat.{position()}.{../@HabitatID}">
<xsl:copy-of select="../*[not(self::BigCat|self::BodyTemp)]"/>
</Habitat>
</xsl:when>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The resulting output is seen here.
<Zoo>
<Habitat HabitatID="sub_habitat.1.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<BigCat AnimalID="Tiger.1">
<Type>Bengal</Type>
</BigCat>
</Habitat>
<Habitat HabitatID="sub_habitat.2.habitat.cage.1">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="bobcat.1">
<Type>Bobcat</Type>
</SmallCat>
</Habitat>
<Habitat HabitatID="cage.2">
<Type>Cats</Type>
<Food>Birds</Food>
<SmallCat AnimalID="tabycat.1">
<Type>Tabycat</Type>
</SmallCat>
<BodyTemp>endothermic</BodyTemp>
</Habitat>
<ConsessionStand>
<Type>PopcornStand</Type>
</ConsessionStand>
</Zoo>