XML : How can I aggregate over arrays inside documents in MongoDB and get counts for multiple conditions?

I am writing a program that takes in an XML file of vehicle reflash data and converts it to JSON so it can be stored in a MongoDB database. The XML starts like this:

  <FlashReportGeneratorTag>      <VehicleEntry>          <VehicleStatus>PASSED</VehicleStatus>      </VehicleEntry>      <VehicleEntry>          <VehicleStatus>PASSED</VehicleStatus>      </VehicleEntry>  </FlashReportGeneratorTag>    

After I convert it to JSON and add the project identifier I am left with a format kinda like this:

  {      "FlashReportGeneratorAddedTag" : {          "VehicleEntry" : [              {                  "VehicleStatus" : "PASSED"              },               {                  "VehicleStatus" : "PASSED"              }          ]      },      "project_id" : "1234"  }    

What I would like to do is get an aggregate count of number of vehicles passed and number of vehicles failed within each document for project 1234 but I have had no luck.

I have tried using the basic aggregation skills I know but I cannot simply group by project_id since that will group by document, when I need to aggregate over an array inside of it. I also haven't found any resources that tell you if you can or cannot aggregate two values at once (get sum of passed and sum of failed counts).

As a very last resort I could change the document style around to just have each VehicleEntry be its own document, but I would like to take and store the XML as it is if I can.

EDIT Using Unwind I was able to setup an aggregation for the array that I'm looking for:

  var aggregate = collection.Aggregate().Match(new BsonDocument { { "project_id", "1234" } }).Unwind(i => i["FlashReportGeneratorAddedTag.VehicleEntry"]);    

However, I cannot find the proper way to group these in order to get the pass/fail counts throughout the array. I assume there is some way I need to use the Match function but I can't figure out how to do that without excluding one of the two conditions. Do I have to run aggregation twice, once for passed and once for failed?

No comments:

Post a Comment