XML : How to do an efficient Outer or Left join in XQuery?

I have the following data:

  <!-- subjects.xml -->  <Subjects>      <Subject>          <Id>1</Id>          <Name>Maths</Name>      </Subject>      <Subject>          <Id>2</Id>          <Name>Science</Name>      </Subject>      <Subject>          <Id>2</Id>          <Name>Advanced Science</Name>      </Subject>      <Subject>          <Id>3</Id>          <Name>History</Name>      </Subject>  </Subjects>

which is to be joined to:

  <!-- courses.xml-->  <Courses>      <Course>          <SubjectId>1</SubjectId>          <Name>Algebra I</Name>      </Course>      <Course>          <SubjectId>1</SubjectId>          <Name>Algebra II</Name>      </Course>      <Course>          <SubjectId>1</SubjectId>          <Name>Percentages</Name>      </Course>      <Course>          <SubjectId>2</SubjectId>          <Name>Physics</Name>      </Course>      <Course>          <SubjectId>2</SubjectId>          <Name>Biology</Name>      </Course>  </Courses>

I wish to do a left join on the first table to the second table so as to get the following output:

  <Results>      <Result>          <Table1>              <Subject>                  <Id>1</Id>                  <Name>Maths</Name>              </Subject>          </Table1>          <Table2>              <Course>                  <SubjectId>1</SubjectId>                  <Name>Algebra I</Name>              </Course>              <Course>                  <SubjectId>1</SubjectId>                  <Name>Algebra II</Name>              </Course>              <Course>                  <SubjectId>1</SubjectId>                  <Name>Percentages</Name>              </Course>          </Table2>      </Result>      <Result>          <Table1>              <!-- Notice there are 2 subjects here, as they both have the same ID-->              <Subject>                  <Id>2</Id>                  <Name>Science</Name>              </Subject>              <Subject>                  <Id>2</Id>                  <Name>Advanced Science</Name>              </Subject>          </Table1>          <Table2>              <Course>                  <SubjectId>2</SubjectId>                  <Name>Physics</Name>              </Course>              <Course>                  <SubjectId>2</SubjectId>                  <Name>Biology</Name>              </Course>          </Table2>      </Result>      <Result>          <Table1>              <Subject>                  <Id>3</Id>                  <Name>History</Name>              </Subject>          </Table1>          <Table2>              <!-- Notice this section is empty -->          </Table2>      </Result>  </Results>

I have the following code to do this:

  <Results>      {      (:This part runs quite efficently!:)          (: For each element in courses, where it's 'SubjectId' exists in "subjects.xml":)          for $e2 in doc("courses.xml")/Courses/Course          let $foriegnId := $e2/SubjectId          let $e1 := doc("subjects.xml")/Subjects/Subject[Id = $foriegnId]          where $e1          (: Group by the id :)          group by $foriegnId          return              <Result>                  <Table1>                      {$e1}                  </Table1>                  <Table2>                      {$e2}                  </Table2>              </Result>      }        {      (: PART2 :)      (:Show the remaining elements in courses that have not yet been outputted:)      (:This part makes my script run like 10 times slower!:)          for $e1 in doc('subjects.xml')/Subjects/Subject          let $idVal := $e1/Id          group by $idVal          where not(doc('courses.xml')/Courses/Course/SubjectId = $idVal)          return              <Result>                  <Table1>                      {$e1}                  </Table1>                  <Table2/>              </Result>      }  </Results>

Note the code works fine and does the job. However, I have found that when executing the code for large inputs (750 Subjects, each with 120 courses), then part2 portion of my code makes the code about 10x slower compared to without that portion of the code.

What can I do to make my script faster? Is there a better way of doing this? What's the time complexity?

Thanks for the help!

XML : How to do an efficient Outer or Left join in XQuery?

No comments:

Post a Comment