Approach To Solve Issue In Mongo Collection While Adding New Attribute In Matillion ETL
Matillion uses the extraction-load-transformation (ELT) process to quickly deliver results for many data processing purposes, from user behavior to financial analysis and even reducing the amount of DNA material. MongoDB is a popular open-source document-oriented NoSQL (non-relational) database management system. It is designed to store and manage unstructured or semi-structured data, making it suitable for handling large volumes of diverse data types.
Additionally, MongoDB offers replication, which ensures data durability and fault tolerance by maintaining multiple copies of data across different servers. It also provides built-in support for geographic distribution and automatic data partitioning. MongoDB is widely used in modern web applications, data-intensive projects, and big data solutions. Its flexible data model, scalability, and ease of use make it a popular choice for developers working with complex and rapidly changing data.
What is MongoDB replication?
In simple terms, MongoDB replication is the process of creating a copy of the same data set on more than one MongoDB server. This can be achieved by using a replica set. A replica set is a group of MongoDB instances that maintain the same data set and pertain to any MongoDB process.
Replication enables database administrators to provide:
- Data redundancy
- High availability of data
Maintaining multiple MongoDB servers with the same data provides distributed access to the data while increasing the fault tolerance of the database by providing backups.
Additionally, replication can also be used as a part of load balancing, where read and write operations can be distributed across all the instances depending on the use case.
2 types of nodes in MongoDB
In MongoDB, there are two main types of nodes:
1. Primary Node: The primary node is the main node in a MongoDB replica set. It receives all write operations (inserts, updates, and deletes) from clients and replicates the changes to the secondary nodes in the replica set. The primary node also serves read operations when requested. There can be only one primary node in a replica set at any given time.
2. Secondary Node: Secondary nodes in a MongoDB replica are set to replicate the data from the primary node. They maintain a copy of the primary node's data by applying the replicated operations in the same order as the primary. Secondary nodes can serve read operations as well, but they cannot accept write operations directly from clients. In case the primary node fails or becomes unavailable, a secondary node can be elected as the new primary node through an automatic election process.
These primary and secondary nodes in a replica set provide fault tolerance and high availability by allowing automatic failover and data redundancy in MongoDB deployments.
Use case 1:
We need to take the following actions if the problem, such as a newly added attribute in a Mongo collection, is not replicated in our Mongo DB component:
We have to follow the following steps :
- Select the Mongo component to which that field is to be replicated.
- Select properties and go to the connection option
- You can view the location (for eg: (/usr/share/tomcat8/api_profiles/leads)) of the API profile (i.e) every mongo component has its own API profile
- Go to the project then select Manage API Profile and Manage Query Profile
- We can find our API profile name
- By selecting that API profile name leads, we could see leads. rsd
- Set advanced mode, to view xml script, where we have to add our newly added attribute manually.
In most cases, the field will be replicated in our Mongo component so
- After manually adding the attribute we have to check our Mongo component to, whether we could fetch that field or not.
- If not, again we have to go to the connection option
- Then go to Read Preference, where we can pass the variable, usually, we can use primary preferred
- If we set secondaryPreferred instead of primary preferred, first it will execute in the secondary server, if exist then it will execute in the primary server
- Then the additional attribute added in the Mongo collection will get replicated in the MongoDB component.
MongoDB replication offers several benefits, including high availability, fault tolerance, and scalability. It allows applications to continue functioning even if one or more nodes go offline, ensuring uninterrupted service. Additionally, it enables horizontal scaling by distributing the workload across multiple nodes, accommodating increased read capacity as the number of secondary nodes grows.