Friday, May 10, 2019

Simplifying MongoDB Data Association

With $lookup operator, MongoDB supports the most basic data association between multiple tables. When data association tasks are complicated, the Mongo shell script becomes complicated. With SPL’s discreteness and ease of use (SPL is the abbreviation of Structured Process Language developed by Raqsoft), esProc can help simplify data association in MongoDB. There are more related articles in Raqsoft Community (http://c.raqsoft.com/).
MongoDB is a document-oriented NoSQL database operated on a distributed file system. It stores documents in BSON format, which descripts entity attributes in a more natural and intuitive manner. Since Version 3.2, MongoDB has improved its query capability by supporting data association via $lookup. The real-world business scenarios, however, are complicated and the computing problems are knotty. This results in tortuous scripting process.

In this article, we’ll explain how to simplify association of MongoDB data in SPL.
    1. Association with embedded structures (1)
    2. Association with embedded structures (2)
    3. Association with embedded structures (3)
    4. Association with selected fields
    5. Multi-table association
    6. Associated query by array values
    7. Calling esProc dfx script in Java

1. Association with embedded structures (1)

This involves two associated tables. One of table A’s field is associated with the sub-documents in table B, and the to-be-returned data is contained in the sub-documents. For example, the childs field of childsgroup table contains embedded arrays, in which the name item and mobile item need to be returned.

Test data:
history:
_ididHistorychild_id
1001today worked ch001
2002Workingch004
3003now workingch009
childsgroup:
_idgroupidnamechilds
1g001group1{"id":"ch001","info":{"name":"a",mobile:1111}},
{"id":"ch002","info":{"name":"b",mobile:2222}}
2g002group1{"id":"ch004","info":{"name":"c",mobile:3333}},
{"id":"ch009","info":{"name":"d",mobile:4444}}

The child_id field in history table associates with childs.id in childsgroup table. Below is the desired joining result:
  {
    “_id” : ObjectId(“5bab2ae8ab2f1bdb4f434bc3”),
    “id” : “001”,
    “history” : “today worked”,
    “child_id” : “ch001”,
    “childInfo” :
    {
      “name” : “a”,
      “mobile” : 1111
    }
    ………………
  }

MongoDB script:
db.history.aggregate([
  {$lookup: {
    from: "childsgroup",
    let: {child_id: "$child_id"},
    pipeline: [
      {$match: { $expr: { $in: [ "$$child_id", "$childs.id"] } } },
      {$unwind: "$childs"},
      {$match: { $expr: { $eq: [ "$childs.id", "$$child_id"] } } },
      {$replaceRoot: { newRoot: "$childs.info"} }
      ],
      as: "childInfo"
    }},
  {"$unwind": "$childInfo"}
])

The script uses five operators, including lookup, pipeline, match, unwind and replaceRoot, to handle the computation. It’s complicated and difficult to write.

SPL script (childsgroup.dfx)
A
1=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2=mongo_shell(A1,"history.find()").fetch()
3=mongo_shell(A1,"childsgroup.find()").fetch()
4=A3.conj(childs)
5=A2.join(child_id,A4:id,info)
6>A1.close()

Associated query result:
_ididhistorychild_idinfo
1001today workedch001[a,1111]
2002workingch004[c,3333]
3003now workingch009[d,4444]

SPL script explanation:
A1: Connect to MongoDB database.
A2: Read in data from history table.
A3: Read in data from childsgroup table.
A4: Concatenate the array values in childsgroup table’s childs field into a table sequence.
A5: Perform an associated query between history table an A4’s childs table via the associated fields child_id and id and append an info field to generate a new table sequence.
A6: Close the database connection.
The SPL script is simple, clear and efficient.


2. Association with embedded structures (2)

There are two associated tables. Table A is associated with table B’s embedded documents. The task is to combine table A’s data under the embedded documents. For example, txtPost table’s Comment field contains array values; combine comment_content field of txtComment table under the arrays.

txtComment:
_IDcomment_nocomment_content
1143test test
2140math
txtPost:
_IDpost_noComment
148[{"comment_no" : 143, "comment_group" : 1} ]
247[{"comment_no" : 140, "comment_group" : 2},
 {"comment_no" : 143, "comment_group" : 3} ]

Desired result:
_IDpost_noComment
148[{"comment_no" : 143, "comment_group" : 1,"comment_content" : "test test"} ]
247[{"comment_no" : 140, "comment_group" : 2,"comment_content" : "math"},
 {"comment_no" : 143, "comment_group" : 3,"comment_content" : "test test"} ]

MongoDB script:
db.getCollection("txtPost").aggregate([
  {"$unwind": "$comment"},
  { "$lookup": {
    "from": "txtComment",
    "localField": "comment.comment_no",
    "foreignField": "comment_no",
    "as": "comment.comment_content"
  }},
  {"$unwind": "$comment.comment_content"},
  {"$addFields": { "comment.comment_content":"$comment.comment_content.comment_content"}},
  { "$group": {
    "_id": "$_id",
    'post_no':{"$first": "$post_no"},
    "comment": {"$push": "$comment"}
  }},
  
]).pretty()
Split txtPost records according to members of Comment arrays into more records, perform an associated query between them with records of txtComment, store the result as an array and split the array into records, move comment_content values under Comment field, and group and concatenate the records.

SPL script:
A
1=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2=mongo_shell(A1,"txtPost.find()").fetch()
3=mongo_shell(A1,"txtComment.find()").fetch()
4=A2.conj(comment.derive(A2.post_no:pno))
5=A4.join(comment_no,A3:comment_no,comment_content:Content)
6=A5.group(pno;~:comment)
7>A1.close()

Associated query result:
pnoComment
47[[140, 2,47, …],[143, 3,47, …] ]
48[[143, 1,48, …]]

SPL script explanation:
A1:Connect to MongoDB database.
A2:Read data from txtPost table.
A3:Read data from txComment table.
A4:Create a table sequence with A2’s Comment field and post_no field, which is renamed pno.
A5:Associate A4’s table sequence with A3’s table sequence via comment_no and append comment_content field, which is renamed Content.
A6:Group records by pno and return the current table; ~ represents the current record.
A7:Close the database connection.
Both scripts transform the embedded data into data stored in columns and rows, perform association, and then group the associated records. But SPL script is clearer and more concise.

3. Association with embedded structures (3)

This involves two associated tables. Table A is associated with the sub-documents in table B, and the to-be-returned information is contained in the sub-records. For example, product field in collection2 contains array values and the association result needs to include fields in isCompleted array in collection2.

Test data:
collection1:
 {
       _id: '5bc2e44a106342152cd83e97',
       description:
      {
            status: 'Good',
           machine: 'X'
     },
     order: 'A',
     lot: '1'
};

collection2:
{
       _id: '5bc2e44a106342152cd83e80',
       isCompleted: false,
       serialNo: '1',
       batchNo: '2',
       <!-- note the subdocuments here-->   
       product: [ 
            { order: 'A', lot: '1' },
            { order: 'A', lot: '2' }
   ]
}

Desired result:
{
      _id: 5bc2e44a106342152cd83e97,
      description: {
             status: 'Good',
            machine: 'X',
      },
     order: 'A',
     lot: '1' ,
     isCompleted: false,
     serialNo: '1',
     batchNo: '2'
}


MongoDB script:
db.collection1.aggregate([{
  $lookup: {
    from: "collection2",
    let: {order: "$order", lot: "$lot"},
    pipeline: [{
      $match: {
      $expr:{$in: [ { order: "$$order", lot: "$$lot"}, "$product"] }
      }
      }],
      as: "isCompleted"
      }
    }, {
      $addFields: {
      "isCompleted": {$arrayElemAt: [ "$isCompleted", 0] }
      }
    }, {
      $addFields: { // add the required fields to the top level structure
      "isCompleted": "$isCompleted.isCompleted",
      "serialNo": "$isCompleted.serialNo",
      "batchNo": "$isCompleted.batchNo"
    }
}])
$lookup performs an associated query between the two tables. The first $addFields gets the first field from isCompleted array, and the second $addFields adds the other required fields.

SPL script:
A
1=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2=mongo_shell(A1,"collection1.find()").fetch()
3=mongo_shell(A1,"collection2.find()").fetch()
4=A3.conj(A2.select(order:A3.product.order,lot:A3.product.lot).derive(A3.serialNo:sno,A3.batchNo:bno))
5>A1.close()

SPL script explanation:
A1:Connect to MongoDB database.
A2:Read data from collection1 table.
A3:Read data from collection2 table.
A4:Query A2’s table according to order field and lot field, add A3’s serialNo field and batchNo field, and return the associated table sequence.
A5:Close the database connection.
Both scripts get the desired result. SPL script is clear in selecting required data from the embedded documents and concatenating them into a new table sequence.

4. Association with selected fields

Create a new table through association and with selected fields.

collection1:
user1user2income
120.56
130.26
collection2:
user1user2output
120.3
130.4
230.5

Desired result:
user1user2incomeoutput
120.560.3
130.260.4

MongoDB script:
db.c1.aggregate([
  { "$lookup": {
  "from": "c2",
    "localField": "user1",
    "foreignField": "user1",
    "as": "collection2_doc"
  }},
  {"$unwind": "$collection2_doc"},
  { "$redact": {
    "$cond": [
      {"$eq": [ "$user2", "$collection2_doc.user2"] },
      "$$KEEP",
      "$$PRUNE"
    ]
  }},
  { "$project": {
    "user1": 1,
    "user2": 1,
    "income": "$income",
    "output": "$collection2_doc. output"
  }}
]).pretty()
$lookup performs an associated query.$redact traverses records to perform conditional handling. $project selects the desired fields.

SPL script:
A
1=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2=mongo_shell(A1,"c1.find()").fetch()
3=mongo_shell(A1,"c2.find()").fetch()
4=A2.join(user1:user2,A3:user1:user2,output)
5>A1.close()

SPL script explanation:
A1: Connect to MongoDB database.
A2: Read data from collection1 table.
A3: Read data from collection2 table.
A4: Join the two tables by user1 and user2, append A3’s output field and return a new table sequence.
A5: Close the database connection.
Both MongDB script and SPL script achieve the computing goal. The latter combines some of fields in two tables via a join into a new table, which is similar to relational database computation.


5. Multi-table association

This is the association between more than two tables to get a merged table.
Doc1:
_idfirstNamelastName
U001shubhamverma
Doc2:
_iduserIdaddressmob
2U001Gurgaon9876543200
Doc3:
_iduserIdfbURLstwitterURLs
3U001http://www.facebook.comhttp://www.twitter.com

Merged result:
{
     "_id" : ObjectId("5901a4c63541b7d5d3293766"),
     "firstName" : "shubham",
     "lastName" : "verma",
     "address" : {
         "address" : "Gurgaon"
     },
     "social" : {
         "fbURLs" : "http://www.facebook.com",
         "twitterURLs" : "http://www.twitter.com"
     }
}

MongoDB script:
db.doc1.aggregate([
  {$match: { _id: ObjectId("5901a4c63541b7d5d3293766") } },
  {
    $lookup:
    {
      from: "doc2",
      localField: "_id",
      foreignField: "userId",
      as: "address"
    }
  },
  {
    $unwind: "$address"
  },
  {
    $project: {
      "address._id": 0,
      "address.userId": 0,
      "address.mob": 0
    }
  },
  {
    $lookup:
    {
      from: "doc3",
      localField: "_id",
      foreignField: "userId",
      as: "social"
    }
  },
  {
    $unwind: "$social"
  },
  
  {
    $project: {
      "social._id": 0,
      "social.userId": 0
    }
  }
]).pretty();
There can be different MongoDB scripts to do this and various merged results in MongoDB thanks to the database’s unique data structure.

SPL script:
A
1=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2=mongo_shell(A1,"doc1.find()").fetch()
3=mongo_shell(A1,"doc2.find()").fetch()
4=mongo_shell(A1,"doc3.find()").fetch()
5=A2.join(_id,A3:userId,address,mob)
6=A5.join(_id,A4:userId,fbURLs,twitterURLs)
7>A1.close()

This SPL script is similar to that in the previous example, with only one more joined table. Every join adds a new field to form a merged table.
Obviously, the SPL script is concise and coherent.


6. Associated query by array values

Get records from a table according to the array field in its associated table and generate a new table with certain fields.

Test data:
users:
_idNameworkouts
1000xxx[2,4,6]
1002yyy[1,3,5]
workouts:
_idDateBook
11/1/2001Othello
22/2/2001A Midsummer Night's Dream 
33/3/2001The Old Man and the Sea
44/4/2001GULLIVER’S TRAVELS
55/5/2001Pickwick Papers
66/6/2001The Red and the Black

Desired result:
Name_idDateBook
xxx22/2/2001A Midsummer Night's Dream 
xxx44/4/2001GULLIVER’S TRAVELS
xxx66/6/2001The Red and the Black
yyy11/1/2001Othello
yyy33/3/2001The Old Man and the Sea
yyy55/5/2001Pickwick Papers

MongoDB script:
db.users.aggregate([
  { "$lookup": {
    "from" : "workouts",
    "localField" : "workouts",
    "foreignField" : "_id",
    "as" : "workoutDocumentsArray"
  }},
  {$project: { _id:0,workouts:0} } ,
  {"$unwind": "$workoutDocumentsArray"},
  
  {"$replaceRoot": { "newRoot": { $mergeObjects: [ "$$ROOT", "$workoutDocumentsArray"] } } },
  {$project: { workoutDocumentsArray: 0} }
]).pretty()
Put the result of associated query between users and workouts into an array, split the array and move the embedded document to the top level, and remove the unwanted fields.

SPL script (users.dfx):
A
1=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2=mongo_shell(A1,"users.find()").fetch()
3=mongo_shell(A1,"workouts.find()").fetch()
4=A2.conj(A3.select(A2.workouts^~.array(_id)!=[]).derive(A2.name)) 
5>A1.close()

SPL script explanation:
A1: Connect to MongoDB database.
A2: Read data from users table.
A3: Read data from workouts table.
A4: Get records from workouts table where _id field is included in workouts field of A2’s table sequence, add name field to the records, and return the concatenated table sequence.
A5: Close the database connection.
The operation is considered normal only if the matching result isn’t null. So _id field is converted into a sequence to make the comparison convenient in A4.
The SPL script is more integration-friendly, more flexible and much simpler.

7. Calling esProc dfx script in Java

The result of executing an SPL script can be easily integrated by a Java application. esProc provides JDBC driver to be accessed withJava stored procedure. That is the same as calling a SQL stored procedure in Java. (For esProc JDBC deployment, refer to 1.9 esProc JDBC in esProc Tutorial)

Below is the process of calling an SPL script from Java:
    public void testUsers(){
        Connection con = null;
        com.esproc.jdbc.InternalCStatement st;
        try{
            //Establish database connection
            Class.forName("com.esproc.jdbc.InternalDriver");
            con= DriverManager.getConnection("jdbc:esproc:local://");
            // Call the stored procedure, in which user is the dfx file’s name
            st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call users ()");
            // Execute the stored procedure
            st.execute();
            //Get result set
            ResultSet rs = st.getResultSet();
            ......
            catch(Exception e){
                System.out.println(e);
            }

This is a standard JDBC invocation, which is easy to embed an SPL script into a Java application. esProc also supports the ODBC driver, which makes it easy to be integrated with languages that use ODBC.

Compared with relational databases, MongoDB stores data in a more complicated, more flexible way, has powerful and adaptable query syntax, and provides various functions and rich cooperative possibilities between them. It’s not easy to be proficient in MongoDB. esProc, however, features discreteness and ease of use to become the NoSQL database’s right-hand and can help it play to its full potential.

No comments:

Post a Comment